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Abstract 

We review how Shannon's classical notion of capacity is not enough to characterize a noisy communication channel if the 
channel is intended to be used as part of a feedback loop to stabilize an unstable scalar linear system. While classical capacity 
is not enough, another sense of capacity (parametrized by reliability) called "anytime capacity" is shown to be necessary for 
the stabilization of an unstable process. The required rate is given by the log of the unstable system gain and the required 
reliability comes from the sense of stability desired. A consequence of this necessity result is a sequential generalization of the 
Schalkwijk/Kailath scheme for communication over the AWGN channel with feedback. 

In cases of sufficiently rich information patterns between the encoder and decoder, adequate anytime capacity is also shown to 
be sufficient for there to exist a stabilizing controller. These sufficiency results are then generalized to cases with noisy observations, 
delayed control actions, and without any explicit feedback between the observer and the controller. Both necessary and sufficient 
conditions are extended to continuous time systems as well. We close with comments discussing a hierarchy of difficulty for 
£^ ■ communication problems and how these results establish where stabilization problems sit in that hierarchy. 

Index Terms 

Real-time information theory, reliability functions, error exponents, feedback, anytime decoding, sequential coding, control 
over noisy channels 
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The necessity and sufficiency of anytime capacity for stabilization of a linear system over a noisy 

communication link 
Part I: scalar systems 



I. Introduction 

For communication theorists, Shannon's classical channel 
capacity theorems are not just beautiful mathematical results, 
they are useful in practice as well. They let us summarize 
a diverse range of channels by a single figure of merit: 
the capacity. For most non-interactive point-to-point com- 
munication applications, the Shannon capacity of a channel 
provides an upper bound on performance in terms of end-to- 
end distortion through the distortion-rate function. As far as 
distortion is concerned, all that matters is the channel capacity 
and the nature of the source. Given enough tolerance for 
end-to-end delay, the source can be encoded into bits and 
those bits can be reliably transported across the noisy channel 
if the rate is less than the Shannon capacity. As long as 
the source, distortion, and channel are well-behaved[l], [2], 
there is asymptotically no loss in separating the problems of 
source and channel coding. This provides a justification for the 
layered architecture that lets engineers isolate the problem of 
reliable communication from that of using the communicated 
information. Recent advances in coding theory have also made 
it possible to approach the capacity bounds very closely in 
practical systems. 

In order to extend our understanding of communication to 
interactive settings, it is essential to have some model for 
interaction. Schulman and others have studied interaction in 
the context of distributed computation [3], [4]. The interaction 
there is between computational agents that have access to some 
private data and wish to perform a global computation in a 
distributed way. The computational agents can only commu- 
nicate with each other through noisy channels. In Schulman's 
formulation, capacity is not a question of major interest 
since constant factor slowdowns are considered acceptable. 1 
Fundamentally, this is a consequence of being able to design 
all the system dynamics. The rich field of automatic control 
provides an interactive context to study capacity requirements 
since the plant dynamics are given, rather than something that 
can be designed. In control, we consider interaction between 
an observer that gets to see the plant and a controller that gets 
to control it. These two can be connected by a noisy channel. 

Shannon himself had suggested looking to control problems 
for more insight into reliable communication [5]. 

"... can be pursued further and is related to a 
duality between past and future 2 and the notions of 
control and knowledge. Thus we may have knowl- 
edge of the past and cannot control it; we may 
control the future but have no knowledge of it." 



'Furthermore, such constant factor slowdowns appear to be unavoidable 
when facing the very general class of interactive computational problems. 
2 The differing roles of the past and future are made clear in [6]. 



We are far from the first to attempt to bring together 
information and control theory. In [7], Ho, Kastner, and Wong 
drew out a detailed diagram in which they summarized the 
then known relationships among team theory, signaling, and 
information theory from the perspective of distributed control. 
Rather than taking such a broad perspective, we instead 
ask whether Shannon's classical capacity is the appropriate 
characterization for communication channels arising in dis- 
tributed control systems. Our interest is in understanding the 
fundamental relationship between problems of stabilization 
and problems of communication. 

Tatikonda's recent work on sequential rate distortion theory 
provides an information-theoretic lower-bound on the achiev- 
able performance of a control system over a channel. Because 
this bound is sometimes infinite, it also implies that there 
is a fundamental rate of information production, namely the 
sum of the logs of the unstable eigenvalues of the plant, 
that is invariantly attached to an unstable linear discrete-time 
process [8], [9]. This particular notion of rate was justified 
by showing how to stabilize the system over a noiseless 
feedback link with capacity greater than the intrinsic rate for 
the unstable process. 3 Nair et al. extended this to cover the 
case of unbounded disturbances and observation noise under 
suitable conditions [10], [1 1]. In addition to noiseless channels, 
the results were extended for almost-sure stabilization in the 
context of undisturbed 4 control systems with bounded initial 
conditions being stabilized over certain noisy channels [12]. 

We had previously showed that it is possible to stabilize 
persistently disturbed controlled Gauss-Markov processes over 
suitable power-constrained AWGN (Additive White Gaussian 
Noise) channels[13], [14] where it turns out that Shannon 
capacity is tight and linear observers and controllers are 
sufficient to achieve stabilization [15]. In contrast, we showed 
that the Shannon capacity of the binary erasure channel 
(BEC) is not sufficient to check stabilizability and introduced 
the anytime capacity as a candidate figure of merit [16]. 
Following up on our treatment of the BEC case, Martins et 
al. have studied more general erasure-type models and have 
also incorporated bounded model uncertainty in the plant [17]. 
There is also related work by Elia that uses ideas from robust 
control to deal with communication uncertainty in a mixed 
continuous/discrete context, but restricting to linear operations 
[18], [19]. Basar and his students have also considered such 
problems and have studied the impact of a noisy channels on 
both the observations and the controls [20]. The area of control 
with communications constraints continues to attract attention 
and the reader is directed to the recent September 2004 issue 

3 The sequential rate-distortion bound is generally not attained even at higher 
rates except in the case of perfectly matched channels. 

4 In seminal work [12], there is no persistent disturbance acting on the 
unstable plant. 
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Fig. 1 . The "equivalence" between stabilization over noisy feedback channels 
and reliable communication over noisy channels with feedback is the main 
result established in this paper. 



of IEEE Transactions on Automatic Control and the articles 
therein for a more comprehensive survey. 

Many of the issues that arise in the control context also arise 
for the conceptually simpler problem of merely estimating an 
unstable open-loop process 5 , across a noisy channel. For this 
estimation problem in the limit of large, but finite, end-to-end 
delays, we have proved a source coding theorem that shows 
that the distortion-rate bound is achievable. Furthermore, it 
is possible to characterize the information being produced 
by an unstable process [23]. It turns out that such processes 
produce two qualitatively distinct types of information when 
it comes to transport over a noisy channel. In addition to the 
classical Shannon-type of information found in traditional rate- 
distortion settings 6 , there is an essential core of information 
that captures the unstable nature of the source. While classical 
Shannon reliability suffices for the classical information, this 
unstable core requires anytime reliability for transport across 
a noisy channel. 7 As also discussed in this paper, anytime 
reliability is a sense of reliable transmission that lies between 
Shannon's classical e— sense of reliable transmission and his 
zero-error reliability [25]. In [23], we also review how the 
sense of anytime reliability is linked to classical work on 
sequential tree codes with bounded delay decoding. 8 

The new feature in control systems is their essential interac- 
tivity. The information to be communicated is not a message 
known in advance that is used by some completely separate 
entity. Rather, it evolves through time and is used to control the 
very process being encoded. This introduces two interesting 
issues. First, causality is strictly enforced. The encoder and 
controller must act in real time and so taking the limit of 
large delays must be interpreted very carefully. Second, it 
is unclear what the status of the controlled process is. If 
the controller succeeds in stabilizing the process, it is no 
longer unstable. As explored in Section lll-DI a purely external 
non-interactive observer could treat the question of encoding 
the controlled closed-loop system state using classical tools 
for the encoding and communication of a stationary ergodic 

5 The unstable open-loop processes discussed here are first-order nonsta- 
tionary autoregressive processes [21], of which an important special case is 
the Wiener process considered by Berger [22]. 

6 In [23], we show how the classical part of the information determines the 
shape of the rate-distortion curve, while the unstable core is responsible for 
a shift of this curve along the rate axis. 

7 How to communicate such unstable processes over noisy channels had 
been an open problem since Berger had first developed a source-coding 
theorem for the Wiener process [24]. Berger had conjectured that it was 
impossible to transport such processes over generic noisy channels with 
asymptotically finite end-to-end distortion using traditional means. 

8 Reference [26] raised the possibility of such a connection early on. 



process. Despite having to observe and encode the exact same 
closed-loop process, the observer internal to the control system 
requires a channel as good as that required to communicate 
the unstable open-loop process. This seemingly paradoxical 
situation illustrates what can happen when the encoding of 
information and its use are coupled together by interactivity. 

In this paper (Part I), the basic equivalence between feed- 
back stabilization and reliable communication is established. 
The scalar problem (Figure |3 is formally introduced in 
Section |n] where classical capacity concepts are also shown 
to be inadequate. In Section [H]] it is shown that adequate 
feedback anytime capacity is necessary for there to exist an 
observer/controller pair able to stabilize the unstable system 
across the noisy channel. This connection is also used to give 
a sequential anytime version of the Schalkwijk/Kailath scheme 
for the AWGN channel with noiseless feedback. 

Section IIVI shows the sufficiency of feedback anytime 
capacity for situations where the observer has noiseless ac- 
cess to the channel outputs. In Section [V] these sufficiency 
results are generalized to the case where the observer only 
has noisy access to the plant state. Since the necessary and 
sufficient conditions are tight in many cases, these results show 
the asymptotic equivalence between the problem of control 
with "noisy feedback" and the problem of reliable sequential 
communication with noiseless feedback. In Section fVll these 
results are further extended to the continuous time setting. 
Finally, Section IVIII justifies why the problem of stabilization 
of an unstable linear control system is "universal" in the same 
sense that the Shannon formulation of reliable transmission 
of messages over a noisy channel with (or without) feedback 
is universal. This is done by introducing a hierarchy of 
communication problems in which problems at a given level 
are equivalent to each other in terms of which channels are 
good enough to solve them. Problems high in the hierarchy 
are fundamentally more challenging than the ones below them 
in terms of what they require from the noisy channel. 

In Part II, the necessity and sufficiency results are general- 
ized to the case of multivariable control systems on an unstable 
eigenvalue by eigenvalue basis. The role of anytime capacity 
is played by a rate region corresponding to a vector of anytime 
reliabilities. If there is no explicit channel output feedback, the 
intrinsic delay of the control system's input-output behavior 
plays an important role. It shows that two systems with the 
same unstable eigenvalues can still have potentially different 
channel requirements. These results establish that in interactive 
settings, a single "application" can fundamentally require 
different senses of reliability for its data streams. No single 
number can adequately summarize the channel and any layered 
communication architecture should allow applications to adjust 
reliabilities on bitstreams. 

There are many results in this paper. In order not to burden 
the reader with repetitive details and unnecessarily lengthen 
this paper, we have adopted a discursive style in some of the 
proofs. The reader should not have any difficulty in filling in 
the omitted details. 
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Fig. 2. Control over a noisy communication channel. The unstable scalar 
system is persistently disturbed by Wt and must be kept stable in closed-loop 
through the actions of 0,C. 



II. Problem definition and basic challenges 

Section III-AI formally introduces the control problem of 
stabilizing an unstable scalar linear system driven by both a 
control signal and a bounded disturbance. In Section III-BI 
classical notions of capacity are reviewed along with how 
to stabilize an unstable system with a finite rate noiseless 
channel. In Section IH-CI it is shown by example that the 
classical concepts are inadequate when it comes to evaluating a 
noisy channel for control purposes. Shannon's regular capacity 
is too optimistic and zero-error capacity is too pessimistic. 
Finally, Section ITl-DI shows that the core issue of interactivity 
is different than merely requiring the encoders and decoders 
to be delay-free. 



This definition requires the probability of a large state value 
to be appropriately bounded. A looser sense of stability is 
given by: 

Definition 2.2: A closed-loop dynamic system with state 
Xt is rj-stable if there exists a constant K s.t. i?[|Xt| ?? ] < K 
for all t > 0. 

In both definitions, the bound is required to hold for all 
possible sequences of bounded disturbances {Wt} that satisfy 
the given bound 0. We do not assume any specific probability 
model governing the disturbances. Rather than having to 
specify a specific target for the tail probability /, holding the 
//-moment within bounds is a way of keeping large deviations 
rare. The larger r} is, the more strongly very large deviations 
are penalized. The advantage of //-stability is that it allows 
constant factors to be ignored while making sharp asymptotic 
statements. Furthermore, Section ITlI-CI shows that for generic 
DMCs, no sense stronger than //-stability is feasible. 

The goal in this paper is to find necessary and sufficient 
conditions on the noisy channel for there to exist an observer 
O and controller C so that the closed loop system shown 
in Figure [2] is stable in the sense of definitions \2.1\ or \2.2\ 
The problem is considered under different information patterns 
corresponding to different assumptions about what information 
is available at the observer O. The controller is always 
assumed to just have access to the entire past history 10 of 
channel outputs. 

For discrete-time linear systems, the intrinsic rate of infor- 
mation production (in units of bits per time) equals the sum of 
the logarithms (base 2) of the unstable eigenvalues [9]. In the 
scalar case studied here, this is just log 2 A. This means that 
it is generically 11 impossible to stabilize the system in any 
reasonable sense if the feedback channel's Shannon classical 
capacity C < log 2 A. 



A. The control problem 

X t+1 = XX t + U t + W u t>0 (1) 

where {X t } is a JR-valued state process. {Ut} is a JR-valued 
control process and {Wt} is a bounded noise/disturbance 
process s.t. \Wt\ < §■ This bound is assumed to hold with 
certainty. For convenience, we also assume a known initial 
condition Xq = 0. 

To make things interesting, consider A > 1 so the open- 
loop system is exponentially unstable. The distributed nature 
of the problem (shown in Figure comes from having 
a noisy communication channel in the feedback path. The 
observer/encoder system O observes X t and generates inputs 
a t to the channel. It may or may not have access to the 
control signals Ut or past channel outputs £?t-i as well. The 
decoder/controller 9 system C observes channel outputs Bt and 
generates control signals Ut- Both 0,C are allowed to have 
unbounded memory and to be nonlinear in general. 

Definition 2.1: A closed-loop dynamic system with state 
X t is f -stable if V(\X t \ > m) < f(m) for all t > 0. 

'Because the decoder and controller are both on the same side of the 
communication channel, they can be lumped together into a single box. 



B. Classical notions of channels and capacity 

Definition 2.3: A discrete time channel is a probabilistic 
system with an input. At every time step t, it takes an input 
at € A and produces an output b t £ B with probability 12 
p(B t \a\ , b\~ l ) where the notation a\ is shorthand for the 
sequence oi, a,2, ■ ■ ■ , at- In general, the current channel output 
is allowed to depend on all inputs so far as well as on past 
outputs. 

The channel is memoryless if conditioned on a t , B t is 
independent of any other random variable in the system that 
occurs at time t or earlier. All that needs to be specified is 

p(B t \a t ). 

The maximum rate achievable for a given sense of reliable 
communication is called the associated capacity. Shannon's 

'"in Section Ull-C. 31 it is shown that anything less than that can not work 
in general. 

"There are pathological cases where it is possible to stabilize a system 
with less rate. These occur when the driving disturbance is particularly 
structured instead of just being unknown but bounded. An example is when 
the disturbance only takes on values ±1 while A = 4. Clearly only one bit 
per unit time is required even though log 2 A = 2. 

12 This is a probability mass function in the case of discrete alphabets £?, but 
is more generally an appropriate probability measure over the output alphabet 
B. 
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classical reliability requires that after a suitably large end-to- 
end delay 13 n that the average probability of error on each bit 
is below a specified e. Shannon classical capacity C can also 
be calculated in the case of memoryless channels by solving 
an optimization problem: 



C 



sup I(A; B) 

V(A) 



where the maximization is over the input probability distribu- 
tion and I(A; B) represents the mutual information through 
the channel [1]. This is referred to as a single letter character- 
ization of channel capacity for memoryless channels. Similar 
formulae exist using limits in cases of channels with memory. 
There is another sense of reliability and its associated capacity 
Co called zero-error capacity which requires the probability 
of error to be exactly zero with sufficiently large n. It does 
not have a simple single-letter characterization [25]. 

Example 2.1: Consider a system Q with f2 = 1 and A = 
|. Suppose that the memoryless communication channel is a 
noiseless one bit channel. So A = B = {0, 1} and p(B t = 
l\a t = 1) = p{B t = 0\a t = 0) = 1 while p(B t = l\a t = 
0) = p(B t = 0\a t = 1) = 0. This channel has C a = C = 1 > 
log 2 |- 



Use a memoryless observer 



0(x) = 

and memoryless controller 

C(B) = 



if x < 

1 if x > 



if B 
if B 



Assume that the closed loop system state is within the 
interval [—2, +2]. If it is positive, then it is in the interval 
[0, +2]. At the next time, \X + W would be in the interval 
[—5,5]. The applied control of — | shifts the state back to 
within the interval [—2, +2]. The same argument holds by 
symmetry on the negative side. Since it starts at 0, by induction 
it will stay within [—2, +2] forever. As a consequence, the 
second moment will stay less than 4 for all time, and all the 
other moments will be similarly bounded. 

In addition to the Shannon and zero-error senses of reli- 
ability, information theory has various reliability functions. 
Such reliability functions (or error exponents) are traditionally 
considered an internal matter for channel coding and were 
viewed as mathematically tractable proxies for the issue of 
implementation complexity [1]. Reliability functions study 
how fast the probability of error goes to zero as the relevant 
system parameter is increased. Thus, the reliability functions 
for block-codes are given in terms of the block length, 
reliability functions for convolutional codes in terms of the 
constraint length[27], and reliability functions for variable- 
length codes in terms of the expected block length [28]. With 
the rise of sparse code constructions and iterative decoding, 
the prominence of error exponents in channel coding has 

13 Traditionally, the community has used block-length for a block code as 
the fundamental quantity rather than delay. It is easy to see that doing encoding 
and decoding in blocks of size n corresponds to a delay of between n and 
2n on the individual bits being communicated. 



diminished since the computational burden is not superlinear 
in the block-length. 

For memoryless channels, the presence or absence of feed- 
back does not alter the classical Shannon capacity [1]. More 
surprisingly, for symmetric DMCs, the fixed block coding 
reliability functions also do not change with feedback, at 
least in the high rate regime [29]. From a control perspective, 
this is the first indication that neither Shannon's capacity nor 
block-coding reliability functions are the perfect fit for control 
applications. 

C. Counterexample showing classical concepts are inadequate 

We use erasure channels to construct a counterexample 
showing the inadequacy of the Shannon classical capacity in 
characterizing channels for control. While both erasure and 
AWGN channels are easy to deal with, it turns out that AWGN 
channels can not be used for a counterexample since they 
can be treated in the classical LQG framework [15]. The 
deeper reason for why AWGN channels do not provide a 
counterexample is given in Section IIII-C.4I 

1) Erasure channels: The packet erasure channel models 
situations where errors can be reliably detected at the receiver. 
In the model, sometimes the packet being sent does not make 
it through with probability S, but otherwise it makes it through 
correctly. Explicitly: 

Definition 2.4: The L-bit packet erasure channel is a mem- 
oryless channel with A = {0, 1} L , B = {0, 1} L U {0} and 
p{x\x) = 1 — 5 while p(0|x) = 5. 

It is well known that the Shannon capacity of the packet 
erasure channel is (1 — S)L bits per channel use regardless 
of whether the encoder has feedback or not [1]. Furthermore, 
because a long string of erasures is always possible, the zero- 
error capacity Co of this channel is 0. There are also variable- 
length packet erasure channels where the packet-length is 
something the encoder can choose. See [30] for a discussion 
of such channels. 

To construct a simple counterexample, consider a further 
abstraction: 

Definition 2.5: The real packet erasure channel has A = 
B = R and p(x\x) = 1-6 while p(0\x) = 8. 

This model has also been explored in the context of Kalman 
filtering with lossy observations [31], [32]. It has infinite clas- 
sical capacity since a single real number can carry arbitrarily 
many bits within its binary expansion, while the zero-error 
capacity remains 0. 

2) The inadequacy of Shannon capacity: Consider the 
problem from example 12.11 except over the real erasure 
channel instead of the one bit noiseless channel. The goal 
is for the second moment to be bounded (77 = 2) and recall 
that A = |. Let 5 = \ so that there is a 50% chance of any 
real number being erased. Assume the bounded disturbance 
Wt, assume that it is zero-mean and iid with variance a 2 . By 
assuming an explicit probability model for the disturbance, the 
problem is only made easier as compared to the arbitrarily- 
varying but bounded model introduced earlier. 

In this case, the optimal control is obvious — set a t = 
X t as the channel input and use Ut = —XB t as the control. 
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Fig. 3. The control system with an additional passive joint source-channel 
encoder £ p watching the closed loop state Xt and communicating it to a 
passive estimator T> p . The controller C implicitly needs a good causal estimate 
for Xt and the passive estimator T> p explicitly needs the same thing. Which 
requires the better channel? 



With every successful reception, the system state is reset to 
the initial condition of zero. For an arbitrary time t, the time 
since it was last reset is distributed like a geometric-^ random 
variable. Thus the second moment is: 
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This diverges as t — > oo since | > 1. 

Notice that the root of the problem is that (f) 2 ^) > 1. 
Intuitively, the system is exploding faster than the noisy 
channel is able to give reliability. This causes the second 
moment to diverge. In contrast, the first moment E^X^] is 
bounded for all t since (f)(5) < 1. 

The adequacy of the channel depends on which moment is 
required to be bounded. Thus no single-number characteriza- 
tion like classical capacity can give the figure-of -merit needed 
to evaluate a channel for control applications. 

D. Non-interactive observation of a closed-loop process 

Consider the system shown in Figure [5] In this, there is an 
additional passive joint source-channel encoder £ p watching 
the closed loop state X t and communicating it to a passive 
estimator T> p through a second independent noisy channel. 
Both the passive and internal observers have access to the 
same plant state and we can also require the passive encoder 
and decoder to be causal — no end-to-end delay is permitted. 



At first glance, it certainly appears that the communication 
situations are symmetric. If anything, the internal observer is 
better off since it also has access to the control signals while 
the passive observer is denied access to them. 

Suppose that the closed-loop process Q had already been 
stabilized by the observer and controller system of 12. 1 1 so that 
the second moment E[Xf] < K for all t. Suppose that the 
noisy channel facing the passive encoder is the real \ -erasure 
channel of the previous section. It is interesting to consider 
how well the passive observer does at estimating this process. 

The optimal encoding rule is clear, set a t = X t . It is 
certainly feasible to use X t = B t itself as the estimator for 
the process. This passive observation system clearly achieves 
E[(Xt — Xt) 2 ] < < K since the probability of non-erasure 
is |. The causal decoding rule is able to achieve a finite end- 
to-end squared error distortion over this noisy channel in a 
causal and memoryless way. 

This example makes it clear that the challenge here is arising 
from interactivity, not simply being forced to be delay-free. 
The passive external encoder and decoder do not have to face 
the unstable nature of the source while the internal observer 
and controller do. An error made while estimating X t by the 
passive decoder has no consequence for the next state X t +i 
while a similar error by the controller does. 

III. Anytime capacity and its necessity 

Anytime reliability is introduced and related to classical 
notions of reliability in [23]. Here, the focus is on the maxi- 
mum rate achievable for a given sense of reliability rather than 
the maximum reliability possible at a given rate. The two are 
of course related since fundamentally there is an underlying 
region of feasible rate/reliability pairs. 

Since the open-loop system state has the potential to grow 
exponentially, the controller's knowledge of the past must 
become certain at a fast rate in order to prevent a bad 
decision made in the past from continuing to corrupt the 
future. When viewed in the context of reliably communicating 
bits from an encoder to a decoder, this suggests that the 
estimates of the bits at the decoder must become increasingly 
reliable with time. The sense of anytime reliability is made 
precise in Section Illl-AI Section Illl-BI then establishes the 
key result of this paper relating the problem of stabilization 
to the reliable communication of messages in the anytime 
sense. Finally, some consequences of this connection are 
studied in Section IIII-CI Among these consequences is a 
sequential generalization of the Schalkwijk/Kailath scheme 
for communication over an AWGN channel that achieves a 
doubly-exponential convergence to zero of the probability of 
bit error universally over all delays simultaneously. 

A. Anytime reliability and capacity 

The entire message is not assumed to be known ahead of 
time. Rather, it is made available gradually as time evolves. 
For simplicity of notation, let Mj be the R bit message that 
the channel encoder gets at time i. At the channel decoder, no 
target delay is assumed — i.e. the channel decoder does not 
necessarily know when the message i will be needed by the 
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Fig. 4. The problem of communicating messages in an anytime fashion. 
Both the encoder £ and decoder T> are causal maps and the decoder in 
principle provides updated estimates for all past messages. These estimates 
must converge to the true message values appropriately rapidly with increasing 
delay. 



application. A past message may even be needed more than 
once by the application. Consequently, the anytime decoder 
produces estimates Mj(i) which are the best estimates for 
message i at time t based on all the channel outputs received so 
far. If the application is using the past messages with a delay d, 
the relevant probability of error is V(M t 1 ~ d (t) ^ M* _d ). This 
corresponds to an uncorrected error anywhere in the distant 
past (ie on messages Mi, M2, ■ ■ ■ , M t -d) beyond d channel 
uses ago. 

Definition 3.1: As illustrated in figure a rate R commu- 
nication system over a noisy channel is an encoder £ and 
decoder T> pair such that: 

• i?-bit message M, enters 14 the encoder at discrete time i 

• The encoder produces a channel input at integer times 
based on all information that it has seen so far. For 
encoders with access to feedback with delay 1 + 9, this 
also includes the past channel outputs B*^ 1 ^ 8 . 

• The decoder produces updated channel estimates M,(t) 
for all i < t based on all channel outputs observed till 
time t, 

A rate R sequential communication system achieves any- 
time reliability a if there exists a constant K such that: 



V{M{{t) ^ Ml) < K2~ a ^- l) 



(2) 



holds for every i,t. The probability is taken over the chan- 
nel noise, the R bit messages Mi, and all of the common 
randomness available in the system. 

If (J3 holds for every possible realization of the messages 
M, then the system is said to achieve uniform anytime 
reliability a. 

14 In what follows, messages are considered to be composed of bits for 
simplicity of exposition. The i-th bit arrives at the encoder at time and 

thus Mi is composed of the bits ^^L; 1 ) _R J + 1 " 



Communication systems that achieve anytime reliability are 
called anytime codes and similarly for uniform anytime codes. 

We could alternatively have bounded the probability of error 
by 2~ Q ( d ~ log2 K ^ and interpreted log 2 K as the minimum delay 
imposed by the communication system. 

Definition 3.2: The a-anytime capacity Cany (a) of a chan- 
nel is the least upper bound of the rates R (in bits) at which 
the channel can be used to construct a rate R communication 
system that achieves uniform anytime reliability a. 

Feedback anytime capacity is used to refer to the anytime 
capacity when the encoder has access to noiseless feedback of 
the channel outputs with unit delay. 

The requirement for exponential decay in the probability of 
error with delay is reminiscent of the block-coding reliability 
functions E(R) of a channel given in [1]. There is one crucial 
difference. With standard error exponents, both the encoder 
and decoder vary with blocklength or delay n. Here, the 
encoding is required to be fixed and the decoder in principle 
has to work at all delays since it must produce updated 
estimates of the message Mj at all times t > i. 

This additional requirement is why it is called "anytime" 
capacity. The decoding process can be queried for a given 
bit at any time and the answer is required to be increasingly 
accurate the longer we wait. The anytime reliability a specifies 
the exponential rate at which the quality of the answers 
must improve. The anytime sense of reliable transmission 
lies between that represented by classical zero-error capacity 
Co (probability of error becomes zero at a large but finite 
delay) and classical capacity C (probability of error becomes 
something small at a large but finite delay). It is clear that 
V«,C7o<C7 a ny(«)<C. 

By using a random coding argument over infinite tree codes, 
it is possible to show the existence of anytime codes without 
using feedback between the encoder and decoder for all rates 
less than the Shannon capacity. This shows: 

O wy [E r (R)) >R 

where E r (R) is Gallager's random coding error exponent 
calculated in base 2 and R is the rate in bits [33], [23]. Since 
feedback plays an essential role in control, it turns out that 
we are interested in the anytime capacity with feedback. It is 
interesting to note that in many cases for which the block- 
coding error exponents are not increased with feedback, the 
anytime reliabilities are increased considerably [6]. 

B. Necessity of anytime capacity 

Anytime reliability and capacity are defined in terms of 
digital messages that must be reliably communicated from 
point to point. Stability is a notion involving the analog value 
of the state of a plant in interaction with a controller over a 
noisy feedback channel. At first glance, these two problems 
appear to have nothing in common except the noisy channel. 
Even on that point there is a difference. The observer/encoder 
O in the control system may have no explicit access to the 
noisy output of the channel. It can appear to be using the 
noisy channel without feedback. Despite this, it turns out that 
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the relevant digital communication problem involves access 
to the noisy channel with noiseless channel feedback coming 
back to the message encoder. 

Theorem 3.3: For a given noisy channel and 77 > 0, if 
there exists an observer O and controller C for the unstable 
scalar system that achieves £?[|_X"{|' ? ] < K for all sequences of 
bounded driving noise \Wt\ < §, then the channel's feedback 
anytime capacity Canyiv log 2 A) > log 2 A bits per channel 
use. 



The proof of this spans the next few sections. Assume that 
there is an observer/controller pair (0,C) that can //-stabilize 
an unstable system with a particular A and are robust to all 
bounded disturbances of size £1 The goal is to use the pair 
to construct a rate R < log 2 A anytime encoder and decoder 
for the channel with noiseless feedback, thereby reducing 15 
the problem of anytime communication to a problem of 
stabilization. 

The heart of the construction is illustrated in figure [5] The 
"black-box" observer and controller are wrapped around a sim- 
ulated plant mimicking Q. Since the {Ut} must be generated 
by the black-box controller C and the A is prespecified, the 
disturbances {Wt} must be used to carry the message. So, the 
encoder must embed the messages {M t } into an appropriate 
sequence {Wt}, taking care to stay within the fl size limit. 

While both the observer and controller can be simulated 
at the encoder thanks to the noiseless channel output feed- 
back, at the decoder only the channel outputs are available. 
Consequently, these channel outputs are connected to a copy 
of the black-box controller C, thereby giving access to the 
controls {Ut} at the decoder. To extract the messages from 
these control signals, they are first causally preprocessed 
through a simulated copy of the unstable plant, except with no 
disturbance input. All past messages are then estimated from 
the current state of this simulated plant. 

The key is to think of the simulated plant state as the sum 
of the states of two different unstable LTI systems. The first, 
with state denoted X t , is driven entirely by the controls and 
starts in state 0. 




Plant 
Controller 



Channel Feedback 



X t 



Plant 
Observer 
O 



1 Step 
Delay 




X t 




Plant 
Controller 



Fig. 5. The construction of a feedback anytime code from a control system. 
The messages are used to generate the {Wt} inputs which are causally 
combined to generate {Xt} within the encoder. The channel outputs are 
used to generate control signals at both the encoder and decoder. Since the 
simulated plant is stable, —X and X are close to each other. The past message 
bits are estimated from the X at the decoder. 



The fact that the original observer/controller pair stabilized 
the original system implies that \Xt\ = \X — (— Xi)\ is small 
and hence —Xt stays close to X t . 

1 ) Encoding data into the state: As long as the bound il is 
satisfied, the encoder is free to choose any disturbance 17 for the 
simulated plant. The choice will be determined by the data rate 
R and the specific messages to be sent. Rather than working 
with general messages Mi, consider a bitstream Si with bit i 
becoming available at time j| . Everything generalizes naturally 
to non-binary alphabets for the messages, but the notation is 
cleaner in the binary case with Si = ±1. 

X t is the part of X t driven only by the {Wt}. 



x t+1 = xx t + u t 



(3) 



X is available at both the decoder and the encoder due to 
the presence of noiseless feedback. 16 The other, with state 
denoted X t , is driven entirely by a simulated driving noise 
that is generated from the data stream to be communicated. 



Xt+i = xx t + w t 



(4) 



The sum X t = (X t + X t ) behaves exactly like it was coming 
from Q and is fed to the observer which uses it to generate 
inputs for the noisy channel. 



Xt 



\Xt-i + W t -i 
t-1 

^A l W t . 



't-l-i 



a^Ea-w, 



This looks like the representation of a fractional number in 
base A which is then multiplied by A t_1 . This is exploited in 
the encoding by choosing the bounded disturbance sequence 



15 In traditional rate-distortion theory, this "necessity" direction is shown 
by going through the mutual information characterizations of both the rate- 
distortion function and the channel capacity function. In the case of stabiliza- 
tion, mutual information is not discriminating enough and so the reduction of 
anytime reliable communication to stabilization must be done directly. 

16 If the controller is randomized, then the randomness is required to be 
common and shared between the encoder and decoder. 



l7 In [23], a similar strategy is followed assuming a specific density 
for the iid disturbance Wt- In that context, it is important to choose a 
simulated disturbance sequence that behaves stochastically like Wt- This is 
accomplished by using common randomness shared between the encoder and 
decoder to dither the kind of disturbances produced here into ones with the 
desired density. 
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so that 



18 



[Rt] 



l t = 7 A t ^(2 + £l )-^ 



(5) 



fc=0 



where Sk is the fc-th bit 19 of data that the anytime encoder 
has to send and [Rt\ is just the total number of bits that are 
available by time t. 7, e\ are constants to be specified. 

To see that (0 is always possible to achieve by appropriate 
choice of W, use induction. (Q clearly holds for t = 0. Now 
assume that it holds for time t and consider time 4+1: 

X t +i = XXt + Wt 

= 7A t+1 (^(2 + e 1 )' fc ^) + m 

fe=0 

So setting 

L-R(t+i)J 

W t =j\ t+1 J2 (2 + ei)- fe S fc (6) 
gives the desired result. Manipulate (|6} to get Wt = 

lR(t+l)\-[Rt\ 

7 A t+1 (2 + ei )-L«tJ £ (2 + 6!)-^^+,- 

(2 + ei )*-(L«*J) L^+^-L^J 
A ^ log 2 A ' j=l 



To keep this bounded, choose 



!og 2 A 

ei = 2~ir- - 2 



(7) 



2) Extracting data bits from the state estimate: 
Lemma 3.1: Given a channel with access to noiseless feed- 
back, for any rate R < log 2 A, it is possible to encode bits 
into the simulated scalar plant so that the uncontrolled process 
behaves like 10 by using disturbances given in (jfji and the 
formulas and (|8j. At the output end of the noisy channel, 
it is possible to extract estimates Si(t) for the i-th bit sent for 
which the error event 

{u\3i < j,Si(t) ± Si(t)} C {co\\X t \ > A*"* (t 2 ^)} 

(9) 



and thus: 



P(^(*)^srj(t))< WI>A*-* [y^P ( 10 > 

Proof: Here cu is used to denote members of the underlying 
sample space. 20 

The decoder has — Xt = Xt — Xt which is close to X 
since Xt is small. To see how to extract bits from — Xt, first 
consider how to recursively extract those bits from X t . 

Starting with the first bit, notice that the set of all possible 
X t that have So = +1 is separated from the set of all possible 
X t that have So = — 1 by a gap of 

/ L-RtJ L-RtJ 
7 A ( (1 - £ (2 + - (-1 + £ (2 + ei )" 



k=l 



fc=l 



> 7 A t 2(l-£(2 + £l )- fe ) 



fe=i 



= 7 A*2(1- 



1 + e 







A' 



2ei7 
1 + ei 



which is strictly positive if R < log 2 A. Applying that 
substitution gives \Wt\ = 

lR(t+l)\-[Rt\ 

| 7 A(2 + ei ) ra - (LfflJ) E (2 + ei)-'Vtj +J 'l 

< | 7 A(2 + ei)| 
= |7A 1+ *| 



So by choosing 



o 



7 



2A X +* 



(8) 



the simulated disturbance is guaranteed to stay within the 
specified bounds. 

18 For a rough understanding, ignore the e± and suppose that the message 
were encoded in binary. It is intuitive that any good estimate of the Xt state 
is going to agree with Xt in all the high order bits. Since the system is 
unstable, all the encoded bits eventually become high-order bits as time goes 
on. So no bit error could persist for too long and still keep the estimate close 
to Xt- The ei in the encoding is a technical device to make this reasoning 
hold uniformly for all bit strings, rather than merely "typical" ones. This is 
important since we are aiming for exponentially small bounds and so cannot 
neglect rare events. 

"For the next section, it is convenient to have the disturbances balanced 
around zero and so we choose to represent the bit Si as +1 or — 1 rather 
than the usual 1 or 0. 



Fig. 6. The data bits are used to sequentially refine a point on a Cantor set. 
Its natural tree structure allows bits to be encoded sequentially. The Cantor 
set also has finite gaps between all points corresponding to bit sequences that 
first differ in a particular bit position. These gaps allow the uniformly reliable 
extraction of bit values from noisy observations. 

Notice that this worst-case gap 21 is a positive number that is 
growing exponentially in t. If the first i — 1 bits are the same, 
then both sides can be scaled by (2+ei)' = A"s to get the same 
expressions above and so by induction, it quickly follows that 
the minimum gap between the encoded state corresponding to 
two sequences of bits that first differ in bit position i is given 
by gaPi(t) = 



inf \X t (S) - X t (S)\ > 
s-.s^Si I 



A*"* 



(fe) if*<L»J 



otherwise 



(11) 



20 If the bits to be sent are deterministic, this is the sample space giving 
channel noise realizations. 

2 'The typical gap is larger and so the probability of error is actually lower 
than this bound says it is. 
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Because the gaps are all positive, dl It shows that it is always 
possible to perfectly extract the data bits from X t by using an 
iterative procedure. 22 To extract bit information from an input 

/*: 

1) Initialize threshold To = and counter i = 0. 

2) Compare input I t to 2*. If I t > T h set Si(t) = +1. If 
It < T it set Si(t) = -1. 

3) Increment counter i and update threshold Tj = 

7A*Eto(2 + ei)-*5 fc 

4) Goto step 2 as long as i < [Rt\ 

Since the gaps given by dl It are always positive, the 
procedure works perfectly if applied to input I t — X t . At 
the decoder, apply the procedure to I t = —Xt instead. 

With this, (|9jl is easy to verify by looking at the complemen- 



tary event {cj||X t | < 



A* ^761 



}. The bound dl It thus implies 



that we are less than halfway across the minimum gap for 
bit j at time t. Consequently, there is no error in the step 2 
comparison of the procedure at iterations i < j. □ 

3) Probability of error for bounded moment and other 
senses of stability: Proof of Theorem \3.3\ Using Markov's 
inequality: 

V(\X t \>m) = V(\X t \" > m") 

< E[\X t y>]m-* 

< Km~ v 



Combining with Lemma I3TT1 gives: 

V(S\{t)^S\{t)) < p(|X t |>A*-*(^-)) 

< K(- + —yx-vit-ji) 
7 7 £ i 

= [K(- + J_)'?)2-( r ' lo S2 A )(*-7i) 
7 7 £ i 

Since t — represents the delay between the time that bit i 
was ready to be sent and the decoding time, the theorem is 
proved. □ 

All that was needed from the bounded moment sense of 
stability was some bound on the probability that X t took on 
large values. Thus, the proof above immediately generalizes 
to other senses of stochastic stability if we suitably generalize 
the sense of anytime capacity to allow for other bounds on the 
probability of error with delay. 

Definition 3.4: A rate R communication system achieves 
g— anytime reliability given by a function g(d) if 

V(Mt d (t) ± M\-\t)) < g(d) 

g(d) is assumed to be 1 for all negative values of d. 

The g— anytime capacity Cg-any(<?) of a noisy channel is 
the least upper bound of the rates R at which the channel can 
be used to construct a sequential communication system that 
achieves g— anytime reliability given by the function g(d). 



Notice that for a-anytime capacity, g{d) = K2 ad for some 



K. 



Theorem 3.5: For a given noisy channel and decreasing 
function /(m), if there exists an observer O and controller 
C for the unstable scalar system that achieves "P(|Xt| > m) < 
/(to) for all sequences of bounded driving noise \Wt\ < §, 
then Cg-anyC?) > log 2 A for the noisy channel considered 
with the encoder having access to noiseless feedback and g(d) 
having the form g(d) = f(K\ d ) for some constant K. 
Proof: For any rate R < log 2 A, 



V(S{(t) ^ Sl(t)) 



< 



n\x t 



> 



«7 e i 



Jy l +ei 



1 + ei 



Since the delay d = t—j?, the theorem is proved. 



□ 



This is a minor twist on the procedure followed by serial A/D converters. 



C. Implications 

At this point, it is interesting to consider a few implications 
of Theorem 13.51 

1) Weaker senses of stability than ij-moment: There are 
senses of stability weaker than specifying a specific 77-th 
moment or a specific tail decay target /(to). An example 
is given by the requirement \\xn. m -> 00 'P(\Xt\ > to) = 
uniformly for all t. This can be explored by taking the limit of 
Cany (a) as a | 0. We have shown elsewhere[33], [23] that: 

limCanvM = C 

where C is the Shannon classical capacity. This holds for all 
discrete memoryless channels since the a-anytime reliability 
goes to zero at Shannon capacity but is > for all lower 
rates even without feedback being available at the encoder. 
Thus, classical Shannon capacity is the natural candidate for 
the relevant figure of merit. 

To see why Shannon capacity can not be beaten, it is 
useful to consider an even more lax sense of stability. Suppose 
the requirement were only that liuim-^oc V (\Xt \ > to) = 
10 -5 > uniformly for all t. This imposes the constraint 
that the probability of a large state stays below 10~ 5 for all 
time. Theorem 13.51 would thus only requires the probability 
of decoding error to be less than 10 -5 . However, Wolfowitz' 
strong converse to the coding theorem[l] implies that since 
the block-length in this case is effectively going to infinity, 
the Shannon capacity of the noisy channel still must satisfy 
C > log 2 A. Adding a finite tolerance for unboundedly large 
states does not get around the need to be able to communicate 
log 2 A bits reliably. 

2) Stronger senses of stability than rj-moment: Having / 
decrease only as a power law might not be suitable for certain 
applications. Unfortunately, this is all that can be hoped for 
in generic situations. Consider a DMC with no zero entries 
in its transition matrix. Define p = mirijj p(i, j). For such a 
channel, with or without feedback, the probability of error after 
d time steps is lower bounded by p d since that lower bounds 
the probability of all channel output sequences of length d. 
This implies that the probability of error can drop no more 
than exponentially in d for such DMCs. Tighter upper-bounds 
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on anytime reliability with feedback are available in [34] and 
[6]. 

Theorem 13.51 therefore implies that the only /-senses of 
stability which are possible over such channels are those for 
which: 



f(KX d ) 



> 



'°82(ff ) 

f(m) > p log 2* 

f(m) > K'm log 2^ 

which is a power law. This rules out the "risk sensitive" sense 
of stability in which / is required to decrease exponentially. 
In the context of Theorem 13.31 this also implies that there is 
an rj beyond which all moments must be infinite! 

Corollary 3.1: If any unstable process is controlled over 
a discrete memoryless channel with no feedback zero-error 
capacity, then the resulting state can have at best a power-law 
bound (Pareto distribution) on its tail. 

This is very much related to how sequential decoding must 
have computational effort distributions with at best a Pareto 
distribution[35]. In both cases, the result follows from the 
interaction of two exponentials. The difference is that the 
computational search effort distributions assumed a particular 
structure on the decoding algorithm while the bound here 
is fundamental to the stabilization problem regardless of the 
observers or controllers. 

Thus for DMCs and a given A, we are either limited to a 
power-law tail for the controlled state because of an anytime 
reliability that is at most singly exponential in delay or it is 
possible to hold the state inside a finite box since there is 
adequate feedback zero-error capacity. Nothing in between 
can happen with a DMC. 

3) Limiting the controller effort or memory: If there was a 
hard limit on actuator effort (\U\ <U for some U > 0), then 
the only way to maintain stability is to also have a hard limit on 
how big the state X can get. Theorem 13.51 immediately gives 
a fundamental requirement for feedback zero-error capacity 
> log 2 A since g(d) =0 for sufficiently large d. 

Similarly, consider limited-memory time-invariant con- 
trollers which only have access to the past k channel outputs. If 
the channel has a finite output alphabet and no randomization 
is permitted at the controller, limited memory immediately 
translates into only a finite number of possible control inputs. 
Since there must be a largest one, it reduces to the case of 
having a hard limit on actuator effort. 

We conjecture that even with randomization and time- 
variation, finite memory at the controller implies that the 
channel must have feedback zero-error capacity > log 2 A. 
Intuitively, if the channel has zero-error capacity < log 2 A, 
it can misbehave for arbitrarily long times and build up a 
huge "backlog" of uncertainty that can not be resolved at the 
controller. With finite memory, the controller has no way of 
knowing what uncertainty it is actually facing and so is unable 
to properly interpret the channel outputs to devise the proper 
control signals. 

4) The AWGN case with an average input power constraint: 
The tight relationship between control and communication 



established in Theorem 13.51 allows the construction of se- 
quential codes for noisy channels with noiseless feedback if 
we know how to stabilize linear plants over such channels. 
Consider the problem of stabilizing an unstable plant driven by 
finite variance driving noise over an AWGN channel. A linear 
observer and controller strategy achieve mean-square stability 
for such systems since the problem fits into the standard LQG 
framework [14]. 

By looking more closely at the actual tail probabilities 
achieved by the linear observer/controller strategy, we obtain 
a natural anytime generalization of Schalkwijk and Kailath's 
scheme[36], [37] for communicating over the power con- 
strained additive white Gaussian noise channel with noiseless 
feedback. Its properties are summarized in Figure but the 
highlight is that it achieves doubly exponential reliability with 
delay, universally over all sufficiently long delays. 

Theorem 3.6: It is possible to communicate bits reliably 
across a discrete-time average-power constrained AWGN 
channel with noiseless feedback at any rate R < | log 2 (l+^) 
while achieving a g— anytime reliability of at least 



g{d) = 2e-W 



(12) 



for some constant K that depends only on the rate R, power 
constraint P, and channel noise power a 2 . 
Proof: To avoid having to drag a 2 around, just normalize units 
so as to consider power constraint P' = and a channel with 
iid unit variance noise N t . Choose the A for the simulated Q 
so that R < log 2 A < \ log 2 (l + P'). 

The observer/encoder used is a linear map: 



a t = pX t 



(13) 



so the channel output B t = pX t +N t . Use a linear controller: 

U t = -\<f>Bt (14) 
giving the closed-loop system: 

X t+ i = A(l - p4>)X t + W t - \(t>N t (15) 

where the (3, <fi are constants to be chosen. For the closed-loop 
system to be stable: 



< A(l - p<j>) < 1 



(16) 



Thus [3(f) G (1 — j, 1). Assuming i ll 61 i holds and temporarily 
setting the Wt = for analysis, it is clear that the closed- 
loop X t is Gaussian with a growing variance asymptotically 
tending to 

2 _ A 2 2 
° x ~ \-\ 2 {l- (3<j)) 2 

The channel input power satisfies: 

2 \ 2 m 2 

E[at] - l-A 2 (l-/?0) 2 

Since A 2 < 1 + P', define P" — A 2 - 1 < P' and substitute 
to get: 



Elaf] < 



(P» + 



l-(P" + l)(l-/?0) 2 



(18) 
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By setting (3(f) = P f, +1 , the left hand side of dl 81 is identically 
P" as desired. All that remains is to verify the stability 
condition dl6> : 



X(l-/3cb) = 



A 



P" + 1 



VP' 1 



P" + 1 



VP" + 1 
< 1 

So the closed loop system is stable and the channel noise alone 
results in an average input power of at most P" < P'. 

Rather than optimizing the choice of (3 and <f> to get the best 
tradeoff point, just set (3 = 1 and <j> = p „ t for simplicity. In 
that case, a 2 x = P". 

Now consider the impact of the Wt alone on the closed- 
loop control system. These are going through a stable system 
and so by expanding the recursion dl5> and setting Nt = 0, 



< 



£(A(1 

2=0 

E 



i 



fi 



2 

2 



2(1 



i ) 

y r P TT +T' 

fly/P" + 1 



2{^P r ' 



1-1 



which is a constant that can be made as small as desired by 
choice of fi. Assume that the data stream S to be transmitted 
is independent of the channel noise N. Then, the total average 
input power is bounded by: 



-P 2 \x 



w\2 



< P"+( 



n^P" + i 



< p" 



2( V / P 7r 

n 2 



i - i 

P" H 



■r 



4(P" + 2(1 - V-P" + 1)) 

Since P" < P', we can choose an fi small enough so that the 
channel input satisfies the average power constraint regardless 
of the message bits to be sent. 

All that remains is to see what f(m) this control system 
meets for such arbitrary, but bounded, disturbances. X t is 
asymptotically the sum of a Gaussian with zero mean and 
variance P" together with the closed-loop impact of the 



disturbance X w (t). 
part is bounded: 



Since the total impact of the disturbance 



V{\X t \ > m) 



< 



V{\N al \>m- 
V(\N\ > 



n^/p 7^ TT 



2{VP" + 1-1) 



< 2e 



r (m- 



1 , 

- — [m — 
I 5 " 2(sfW 

nyp"+i ,2 

2(V P " + 1 - 1 ) 



1 - 1 



-)) 



Ignoring the details of the constants, this gives an f(m) = 
2e -K 1 { m -K 2 ) 2 = 2e -Ki(m 2 ~2K 2m -K 3 )^ Applying Theo- 
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unstable 
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Plant: 


R < log 2 A < C 


R < log 2 A < C 


Initial condition: 


bounded 


zero 


Disturbance: 


zero 


bounded 


Stability sense: 


almost-sure [18] 


exponential tail 



Fig. 7. Quick comparison of the Schalkwijk/Kailath scheme to the anytime 
generalization in this paper. 



rem 13 .51 immediately gives dl 21 since X d > 2 



Rd 



□ 



Since the convergence is double exponential, it is faster than 
any exponential and hence 

1 P 
Cany(a) = -log 2 (l + -^) 

for all a > on the AWGN channel. If the additive channel 
noise were not Gaussian, but had bounded support with the 
same variance, then this proof immediately reveals that the 
zero-error capacity of such a bounded noise channel with 
feedback satisfies: Co > ilog 2 (l + -E^). 

In the Gaussian case, it is not immediately clear whether 
there are ideas analogous to those in [38] that can be used 
to further boost the g-anytime reliability beyond double ex- 
ponential. It is clear that if it were possible, it would require 
nonlinear control strategies. 

The AWGN case is merely one example. Theorem 13.51 
gives a way to lower-bound the anytime capacity for channels 
with feedback in cases where the optimal control behavior 
is easy to see. The finite moments of the closed-loop state 
reveal what anytime reliability is being achieved. Often, there 
is a simple upper-bound that matches up with the lower- 
bound thereby giving the anytime capacity itself. The BEC 
case discussed in [16], [33], [6] is such an example. In 
addition, Theorem 13.51 gives us the ability to mix and match 
communication and control tools to study a problem. This is 
exploited in [30], [39] to understand the feedback anytime 
capacity of constrained packet erasure channels and the power 
constrained AWGN+erasure channel. In [40], these results 
are extended to the Gilbert-Eliot channel with feedback. It 
is also exploited in [34] to lower bound the anytime reliability 
achieved by a particular code for the BSC with feedback. 
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IV. The sufficiency of anytime capacity 
A. Overview 

When characterizing a noisy channel for control, the choice 
of information pattern[41] can be critical [14]. The sufficiency 
result is first established for cases with an explicit noiseless 
feedback path from the channel outputs back to the observer. 
Section IIV-EI takes a quick look at the simpler problem of 
almost-sure stabilization when the system is undisturbed and 
all the uncertainty comes from either the channel or the initial 
condition. Then, in Section HV-FI the impact of viewing time 
in blocks of size n and only acting on the slower time-scale 
is examined. Finally, Sections ll V-GI and IIV-HI give models for 
boundedly noisy or quantized controls and/or observations and 
show that such bounded noise can be tolerated. 

To prove the sufficiency theorem addressing the situation 
illustrated in figure|2] we need to design an observer/controller 
pair that deals with the analog plant and communicates across 
the channel by using an anytime communication system. The 
anytime communication system works with noiseless feedback 
from the channel output available at the bit encoder and is 
considered a "black box." 

Theorem 4.1: For a given noisy channel, if there exists 
an anytime encoder/decoder pair with access to noiseless 
feedback that achieves Cg-any(.<?) > log 2 A, then it is possible 
to stabilize an unstable scalar plant with parameter A that is 
driven by bounded driving noise through the noisy channel 
by using an observer that has noiseless access to the noisy 
channel outputs. Furthermore, there exists a constant K so 
that V(\X t \ >m)< g(K + log A m). 

To prove this theorem, explicit constructions are given for 
the observer and controller in the next sections. 



B. Observer 

Since the observer has access to the channel outputs, it 
can run a copy of the controller and hence has access to 
the control signals Ut- Since Wt = X t +\ — XX t — Ut, and 
the observer receives X t from the plant, the observer also 
effectively has access to the Wt- However, it is not sufficient 
to merely encode the Wt independently to some precision. 23 
Instead, the observer will act as though it is working with a 
virtual controller through a noiseless channel of finite rate R 
in the manner of example 12.11 The resulting bits will be sent 
through the anytime code. 

The observer is constructed to keep the state uncertainty at 
the virtual controller inside a box of size A by using bits at 
the rate R. It does this by simulating a virtual process X t 
governed by: 



x t+1 = xx t + w t + u t 



(19) 



where the Ut represent the computed actions of the virtual 
controller. This gives rise to a virtual counterpart of Xt 



x t+i - 



V 



U t 



(20) 



AA 



0- 



Encode virtual control Ut 



Window known to contain Xt 

will grow by factor of A > 1 due to the dynamics 
R bits cut window by a factor of 2~ R 

S 

grows by ^ on each side 
giving a new window for Xt+i 



23 This is because the unstable plant will eventually blow up even tiny 
uncorrected discrepancies between the encoded and actual Wt- 



Fig. 8. Virtual controller for R=l. How the virtual state X evolves. 



which satisfies the relationship X t = X t +Xf . Because X t 
will be kept within a box, it is known that — Xf is close to 
X t . The actual controller will pick controls designed to keep 
X t close to . 

Because of the rate constraint, the virtual control U t takes 
on one of 2L- R ( t+1 )J~L-R t J values. For simplicity of exposition, 
we ignore the integer effects and consider it to be one of 2 R 
values 24 and proceed by induction. Assume that X t is known 
to lie within [-f , f ]. Then XX t will lie within M]. 
By choosing 2 R control values uniformly spaced within that 
interval, it is guaranteed that XXt + Ut will lie within 
[— 2^TT' 2^rr]- Fixity' the state will be disturbed by W t and 
so X t+ i will be known to lie within [-^rr - 5^TT + §]• 

Since the initial condition has no uncertainty, induction will 
be complete if 

^A + 0<A (21) 

To get the minimum A required as a function of R, we can 
solve for Mil being an equality. This occurs 25 when A = 
i_X2-t f° r ever y case where R > log 2 A. Since the slope 
on the left hand side of J2 It is less than 1, any larger A also 
works. 

Since they arose from dividing the uncertainty window to 
2 R disjoint segments, it is clear that the virtual controls Ut 
can be encoded causally using R bits per unit time. These 
bits are sent to the anytime encoder for transport over the 
noisy channel. 



C. Controller 

The controller uses the updated bit estimates from the 
anytime decoder to choose a control to attempt to make the 
true state X t stay close to the virtual state X t . It does this by 
having a pair of internal models as shown in figure [9] 

The first, Xt from (|5Jl, models the unstable system driven 
only by the actual controls. The second is its best estimate X t , 

24 For the details of how to deal with fractional R, please see the causal 
source code discussion in [33]. 

25 In reality, the uncertainty approaches this from below since the system 
starts at the known initial condition 0. 
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Anytime 
channel 
decoder 



Best estimate 
X 

af net impact of 
virtual controls 



lit) 



possible difference between two virtual controls is AA gives: 

d-l 

\x t -x t \ = \Y,x(u t - t -u t -,(t-i))\ 



i=0 
d-l 



< 



Internal model 

for impact 
of past controls 



X, 



Multiply 
by 

— A 



Ui 



Fig. 9. The controller remembers what it did in the past and uses the anytime 
decoder to get an updated sense of where the observer wants it to go. It then 
applies a control designed to correct for any past errors and move the state 
to be close to the virtual state controlled by the observer. 



based on the current bit estimates from the anytime decoder, of 
where the unstable system should be driven only by the virtual 
controls U t . Of course, the controller does not have the exact 
virtual controls, only its best estimates U\ (t) for them. 



(22) 



i=0 



This is not given in recursive form since all of the past 
estimates for the virtual controls are subject to re-estimation 
at the current time t. The control Ut is chosen to make Xt+i 
= X t+1 (t). 



U t = X t+1 (t) - \x t 



(23) 



D. Evaluating stability 

Proof of Theorem \4. 1 \ With controls given by d23t . the true 
state X t can be written as: 

X t = X t +X t = X t +X t (t-l) 
t-1 

= ^V(W t _^ + C/ t _,(t-l)) 



Notice that the actual state X t differs from the virtual state X t 
only due to errors in virtual control estimation due to channel 
noise. If there were no errors in the prefix Ul~ d and arbitrarily 
bad errors for U\_ d , x , then we could start at Xt-d and see 
how much the errors could have propagated since then: 



d-l 



X t = \ d X t - d + J2 X(W t - t + Ut-i(t - 1)) 

t=0 

Comparing this with X t , and noticing that the maximum 



Y,AUt-i-u t -i{t-i)\ 

d-l 
i=Q 

oo 



< 



i=0 

A 
1~X~ 



Since \Xt\ < ^, if we know that there were no errors in the 
prefix of estimated virtual controls until d time steps ago, then 

2A 



{U^it- 1) = ut d } => {\X t \ < ^y^t} (24) 
i24\ immediately gives: 



V(\X t \>m) 



log2 m logg(l— j> 

V(\Xt\ > A^^A 



-log 2 (2A) 



2A 



< V(\X t \>\ 



log 2 m + log2(l~A- 1 j-log 2 (2A) 
log 2 A 



1 - A 
2A 



- ) 



1 - A- 



< 9( 



log 2 m + log 2 (l - A" 1 ) - log 2 (2A) 



log 2 A 



< 9(K" 



log^TO, 

log 2 A ' 



where g bounds the probability of error for the g— anytime 
code and K" is some constant. □ 

Specializing to the case of a-anytime capacity, it is clear 
that: 

log 2 m 

V(\X t \ > m) < K"'2~ a T^ 

which gives a power-law bound on the tail. If the goal is a 
finite 77-th moment, 



E[\XtH 



V^Xtl 71 > m)dm 
V{\X t \ > m«)dm 



< 1 



" r,log 2 A dm 



As long as a > t) log 2 A, the integral above converges and 
hence the controlled process has a bounded 77-moment. 

Theorem 4.2: It is possible to control an unstable scalar 
process driven by a bounded disturbance over a noisy channel 
so that the ?/-moment of \X t \ stays finite for all time if the 
channel has feedback anytime capacity Cany (a) > log 2 A for 
some a > 77 log 2 A and the observer is allowed to observe the 
noisy channel outputs and the state exactly. 
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Aside from the usual gap between > and >, this shows 
that the necessity condition in Theorem 13.31 is tight. Since 
there are no assumptions on the disturbance process except for 
its boundedness, the sufficiency theorems here automatically 
cover the case of stochastic disturbances having any sort of 
memory structure as long as they remain bounded in support. 



Lemma 14.11 can be used to get almost-sure stability by 
noticing that: 

oo oo 



t=0 



t=0 



E. Almost-sure stability 

Control theorists are sometimes interested in an even sim- 
pler problem for which there is no disturbance (i.e. W t = 
for all i) but the initial condition X Q is unknown to within 
some bound fl. For this problem, the goal is ensuring that 
the state X t tends to zero almost surely. This short section 
constructively shows that any sufficiency result for ^-stability 
also extends to almost-sure stabilization. To do this, we 
consider the system: 



< 



K 



X' t+1 = X'Xl + U' t + w[ 



(25) 



and use it to prove a key lemma: 

Lemma 4.1: If it is possible to //-stabilize a persistently 
disturbed system from d25l when driven by any driving noise 
W' bounded by fi, then there exists a time-varying observer 
with noiseless access to the state and a time-varying controller 
so that any undisturbed system Q with initial condition 
l-^o | < §, W = 0, and < A < A' can be stabilized in 
the sense that there exists a K so that: 



E[W] <#(-)"'* (26) 
Proof: Since Wt = for t > 0, it is immediately clear that 
the system of J25l > can be related to the original system of Q 
by the following scaling relationships: 



Wo 
W{ 



X 

if t > 


(y)'" 1 ^-! if t> 



It is possible to use an observer/controller design for the sys- 
tem of d25t to construct one for the original system Q through 
the same mapping. The input to the observer constructed with 
X' in mind will just be (4-)*.X"t and the controls U' just need 
to be scaled down by a factor (jrY so that they will properly 
apply to the X t system. 

Since (125 \ can be //-stabilized, there exists a K' so that for 
all t > 0, 

K' > E[\Xl\"'] 

= EK^f-^'ix^r'} 

= (jT'^EUXt-^'] 



which is bounded. It immediately follows that: 
lim \X t \ v = almost surely 

t — >oo 

almost surely 



lim X t 

t — >oo 

which is summarized in the following theorem: 

Theorem 4.3: If it is possible to //-stabilize a persistently 
disturbed system from (125 \ when driven by any driving noise 
W bounded by O, then there exists a time-varying observer 
with noiseless access to the state and a time-varying controller 
so that any undisturbed system ([0 with initial condition 
|J5T | < §, W t = 0, and < A < A' can be stabilized in 
the almost-sure 26 sense: 



lim X t 

t — *oo 



which immediately yields 



□ 



almost surely 

The important thing to notice about Lemma |4~T1 and Theo- 
rem |^] is that they do not depend on the detailed structure of 
the original problem except for the need to observe the state 
perfectly at the encoder and to be able to apply controls with 
perfect precision. It is clear that if either the state observation 
or the control application was limited in precision, then there 
would be no way to drive the state to zero almost surely. 

Theorem l4.3l is used in SectionlVlto get Corollary 15 . 31 which 
says that for almost-sure stabilization of an undisturbed plant 
across a discrete memoryless channel (DMC), Shannon ca- 
pacity larger than log 2 A suffices regardless of the information 
pattern. 

F. Time in blocks and delayed observations 

In the discussion so far, time has operated at the same scale 
for channel uses, system dynamics, plant observations, and 
control application. Furthermore, the only structural delay in 
the system was the one-step-delay across the noisy channel 
needed to allow the interconnection of the controller, observer, 
channel, and plant to make sense. It is interesting to consider 
different parts of the system operating at slightly different time 
scales and to see the impact of fixed and known delays in the 
system. 

1) Observing and controlling the plant on a slower time 
scale: In the control context, it is natural to consider cases 
where the plant evolves on a slower time scale than commu- 
nication. Formally, suppose that time is grouped into blocks 
of size n and the observer is restricted to only encode the 

26 Here, the probability is over the channel's noisy actions and any random- 
ness present at the observer and controller. The convergence holds for every 
possible initial condition and so it does not matter if the initial condition is 
included in the probability model. 
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value of X t at times that are integer multiples of n. Similarly, 
suppose that the controller only takes an action 27 immediately 
before the observer will sample the state. The effective system 
dynamics change to 



*ra(fc+l) — y l X nk + J7„(fc + i)_x + W' k 



(27) 



where W' k 



En— 1 \n- 
.1=0 A 



- J W n 



Observe that \W' k 



known to be bounded within an interval of size fl' < X 



I is 
n 
A-l ■ 



Essentially, everything has just scaled up by a factor of A". 
Thus all the results above continue to hold above for a system 
described by 1271 at times which are integer multiples of n. 
The rate must be larger than log 2 A™ = n log 2 A bits per n time 
steps which translates to log 2 A bits per time step. The anytime 
reliability a > ?/log 2 A" = n(rylog 2 A) for delay measured in 
units of n time-steps translates into a > rj log 2 A for delay 
measured in unit time steps. This is the same as it was for the 
system described by Q. 

The only remaining question is what happens to the state 
at times within the blocks since no controls are being applied 
while the state continues to grow on its own. At such times, 
the state has just grown by a factor of at most A™ with an 
additive term of at most X n - 



n 

A-l ' 



E[(X n (X nk 
\ vn E[(X nk 



fi 



A - 1 

n . 

A-l' 



))"] 



2max(|X„ fe |, ) -^ T ) X 




— r)" + / V(\X nk r > r)dr 
A-l J { _n_ ]v J 



< ym 2 n ( )V + / p(\x nk \" > r)dr 



which is finite since the original is finite. Thus: 

Theorem 4.4: If for all f2 > 0, it is possible to stabilize a 
particular unstable scalar system with gain A" and arbitrary 
disturbance signal bounded by O when we are allowed n 
uses of a particular channel between when the control-system 
evolves, then for any fi > it is also possible to stabilize an 
unstable scalar system with gain A that evolves on the same 
time scale as the channel using an observer restricted to only 
observe the system every n time steps. 



By simple application of Theorem 14.41 it is known that 
Theorem 14.21 and similarly Theorem 13.31 continue to hold 
even if the observers/controllers only get access to the analog 
system at timesteps that are integer multiples of some n. This 
is used when considering noisy observations in Section HV-HI 
and in the context of vector-valued states in Part II. 

27 The controller can take "no action" by setting Ut = 0. 



2) Known fixed delays: Similarly, we can study cases where 
the assumed "round trip delay" is larger than one. Suppose 
the control signal applied at time t depends only on channel 
outputs up to time t — v for some v > 0. 

It is easy to see that while this sort of deterministic delay 
does degrade performance, it does not change stability. The 
proof of Theorem 14. II goes through as before. Specifically, in 
i22\ will change to: 



(28) 



Everything else proceeds as before, just that in place of d for 
the probability of error we will have d + v. Specifically, in 
place of J241 . we now know only that: 



2A 



= x«- 2AXV 



1 - A- 



(29) 



This is just a change in the constant factor and results in 
a smaller (more negative) constant K to deal with the larger 
uncertainty. This change of constant does not make a bounded 
77-moment become unbounded. The result is summarized in the 
following theorem: 

Theorem 4.5: Theorems 14. Il and l4. 21 continue to hold if the 
control signal Ut is required to depend only on the channel 
outputs up through time t — v where v > 0. Only the constants 
grow larger. 



G. Noisy or quantized controls 

The control signals Ut may not be able to be set by the 
controller to infinite precision. The applied control Ut at 
the plant might be different from the intended control U\ 
generated at the controller. This section considers the case 
of r c -precise controls where the difference is bounded so 
\Ut ~ U\\ < -rr for some constant T c to reflect the noise 
at the controller. It is easy to see that the plant dynamics now 
effectively change from ^ to: 

x t+1 = xx t + u\ + (w t + (u t - ut)) 

where the term (W t + (U t — C/ t 1 )) can be considered the new 
bounded disturbance for the system. So in place of f2, we 
simply use the new bound O + T c . Thus, all the previous 
results continue to hold in the case of boundedly noisy control 
signals. 

Theorem 4.6: If for all > 0, it is possible to stabilize 
a particular unstable scalar system with arbitrary disturbance 
signal bounded by il given the ability to apply precise control 
signals, then for all r c > and il > 0, it remains possible 
to stabilize the same unstable scalar system with arbitrary 
disturbance signal bounded by il given the ability to apply 
only r c -precise control signals. 
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Fig. 10. With noisy observations, no strict partition of the line can adequately 
capture the uncertainty since it can straddle the boundary of two regions. By 
doubling the number of bins, it is guaranteed that the uncertainty arising from 
observation noise can be contained inside a single bin. 




Control Signals 



H. Noisy or quantized observations 



Noisy 
Channel 



The observer of Section HV-BI has exact knowledge of the 
state X t . Suppose that the observation is instead X no i sy (t) = 
X t + N t where N t is known to be within a bound (^r, 2 )• 
For example, this models situations where the input to the 
encoder has already been quantized to some resolution. 28 

The observer needs to ensure that the virtual state X is 
within an interval of size A. To do this, just choose a large 
enough A > 2T so that X no i sy (t) and X t both pick out 
the same interval for the state. As figure EH illustrates, this 
is not quite enough since the intervals used in Section IIV-BI 
are partitions of the real line. Meanwhile, each observation 
of X no i sy (t) gives rise to an uncertainty window for X t G 
(X no i S y (t) — X no i sy (t) + §) that might straddle a boundary 
of the partition. 29 Doubling the number of intervals and having 
them overlap by half ensures that the uncertainty window can 
always fit inside a single interval. Such a doubling increases 
the data rate by at most an additional bit. To amortize this 
additional bit, Theorem 14.41 from Section IIV-FI is used and 
time is considered in blocks of size n. Then, the required rate 
for achievability with blocked time is R > 1 + log 2 A™ bits 
per n time-steps or R > ^ + log 2 A bits per time step. Since 
n can be large enough, R > log 2 A is good enough. Delayed 
control actions also causes no new concerns. Thus, we get the 
following corollary to Theorems 14.21 and 14.51 

Corollary 4.1: It is possible to control an unstable scalar 
process driven by a bounded disturbance over a noisy channel 
so that the 77-moment of \Xt\ stays finite for all time if the 
channel has feedback anytime capacity Cany (a) > log 2 A for 
some a > 77 log 2 A and the observer is allowed to observe the 
noisy channel outputs exactly and has a boundedly noisy view 
of the state. 

This is true even if the control Ut is only allowed to depend 
on channel outputs up through time t — v where v > 0. 

28 Th e quant ization is assumed to be coarse, but with infinite dynamic range. 
Section IllI-CI tells us that finite dynamic range will impose the requirement 
of zero-error capacity on the link. 

29 This will not arise for statically quantized states since those will have 
fixed boundaries. In that case, nothing needs to be done except ensuring that 
the partitions respect those boundaries. 



Fig. 11. Control over a noisy communication channel without explicit 
feedback of channel outputs. 

V. Relaxing feedback 

In this section, we relax the (unrealistic) assumption that 
the observer can observe the outputs of the noisy channel 
directly. This change of information pattern has the potential 
to make the problem more difficult. In distributed control, 
this was first brought out in [42] by the famous Witsen- 
hausen counterexample. This showed that even in the case 
of LQG problems, nonlinear solutions can be optimal when 
the information patterns are not classical. This same example 
also showed how the "control" signals can start to play a 
dual role — simultaneously being used for control and to 
communicate missing information from one party to another 
[43]. Information theory also has experience with the new 
challenges that arise in distributed problems of source and 
channel coding [44]. 

This section restricts the information pattern in stages. First, 
we consider the problem of Figure ^3 in which the observer 
can see the controls but not the channel outputs. Then, we 
consider the problem of Figure [2] that restricts the observer 
to only see the states X t . This section is divided based on the 
approach rather than the problem. 

In Section IV-AI the solutions are based on anytime codes 
without feedback. These give rise to sufficient conditions 
that are more restrictive than the necessary conditions of 
Theorem 13.31 The main result is Theorem 15.21 — a random 
construction that shows it is possible, in the case of DMCs, 
to have nearly memoryless time-varying observers and still 
achieve stability without any feedback. All the complexity can 
in principle be shifted to the controller side. 

In Section IV-BI the solutions are based on explicitly 
communicating the channel outputs back to the observer 
through either the control signals or by making the plant itself 
"dance" in a stable way that communicates limited information 
noiselessly with no delay. Such solutions give rise to tight 
sufficient conditions. These are not as constructive, but serve 
to establish the fundamental connection between stabilization 
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Fig. 12. Control over a noisy communication channel without any explicit 
feedback path from controller to observer except through the plant. 



and communication with noiseless feedback. 

A. Using anytime codes without feedback 

Noisefree access to the control signals is not problematic 
in the case of Corollary 14.11 since the control signals are 
calculated from the perfect channel feedback. Without such 
perfect feedback, it is more realistic to consider only noisy 
access to the control signals. Furthermore, observe that in 
Section I1V-BI knowledge of the actual applied controls is 
used to calculate Wt from the observed X t+ i, Xt-\, Ut- Thus, 
any bounded observation noise on the control signals Ut just 
translates into an effectively larger T bound on the state 
observation noise. By Corollary 14. II any finite T can be dealt 
with and thus: 

Corollary 5.1: It is possible to control an unstable scalar 
process driven by a bounded disturbance over a noisy channel 
so that the 77-moment of \X t \ stays finite for all time if the 
channel without feedback has Cany (a) > log 2 A for some 
a > 77 log 2 A and the observer is allowed noisy access to the 
control signals and the state process as long as the noise on 
both is bounded. 

As discussed in [6], without noiseless feedback the anytime 
capacity will tend to be considerably lower for a given a, 
and so there will be a gap between the necessary condition 
established in Theorem 13.31 and the sufficient condition in 
Corollary O 

Next, consider the problem of figure El that restricts the 
observer to only see the states X t . The challenge is that the 
observer of Section lTV-Bl needs to know the controls in order to 
remove their effect so as to focus only on encoding the virtual 
process X t . As such, a new type of observer is required: 

Definition 5.1: A A-lattice based quantizer is a map (de- 
picted in Figure [O] that maps inputs X to integer bins j. 
The j-th bin spans (A|, A(| + 1)] and is assigned to I £ 
(A(| + i), A(| + |)] near the center of the bin. 



a priori uncertainty for next state given past observations and controls 
state value uncertainty at observer 
A ~ 

I 2 I 4 I 6 I 8 I 10 I 12 I 14 I 2 I 4 I 6 I 8 I 10 I 12 I 14 I 2 I 4 I 6 I 8 I 



I i I 3 I 5 I 7 I 9 I 11 I 13 I i I S I 5 I n 

t £ 
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1 



Periodically repeating labels for bins 



Fig. 13. A 15-regularly-labeled lattice based quantizer. If the observer had 
known the controls, it would have centered the lattice to cover the top bar 
exactly. Because it does not, one additional quantization bin must be added at 
the end so that the uncertainty never covers two bins bearing the same label. 



A L-regularly-labeled A-lattice based quantizer is one 
which outputs j mod L when the input is assigned to bin j 
— one for which the L bin labels repeat periodically. 

A randomly-labeled A-lattice based quantizer is one which 
outputs Aj when the input it assigned to bin j where the Aj 
are drawn iid from a specified distribution. 



Lattice based quantizers have some nice properties: 

Lemma 5.1: a. If X no i sy (t) — X t + N t with observation 
noise Nt £ (^-, then as long as A > 2T, the bin 
j selected by a A-lattice based quantizer facing input 
X n oisy(t) is guaranteed to contain X t . 

b. There exists a constant K depending only on A, A, S7 so 
that if Xt is within a single particular bin, then Xt+ n can 
be in no more than K X n possible adjacent bins whose 
positions are a function of the control inputs applied 
during those n time periods as well as the original bin 
index for X t . 

c. If L > K\ n then knowing the L-regular label assigned 
to X noisy (t + n) is enough to determine a bin guaranteed 
to contain X t + n assuming knowledge of a bin containing 
X t as well as the control inputs applied during those n 
time periods. 

Proof of [a]: X nolsy (t) e (A(| + \), A(| + §)] implies X t € 

< 4 by assumption and 



(A(| + i) - ^ A(| + !) + £]. But § 
henceX t e (A(| + i)-A,A(§ ' 3 



- 2 

A 

4 



= (A| ! A(| + 1)] 



A 

2 1 i' 4 ' "V 2 

which is the extent of the bin j. 

Proof of [b ]: First, suppose that the control actions were 
all zero during the interval in question. Because the system is 
linear, without loss of generality, assume that we start in the 
j = bin, [0, A]. After n time-steps, this can reach at most 
[0, A" A] without disturbances. The bounded disturbances can 
contribute at most 



E 

i=0 



2 



< A ; 



A' 



E 

i=l 

n 



2(A-1) 

to each side, resulting in an interval of with total length 

A "(A + ^). 

By linearity, the effect of any control inputs is a simple 
translation and is therefore just translates the interval by some 
positive or negative amount. Because of the overlapping nature 
of the bins, a single interval can overlap with at most 2 
additional partial bins at the boundaries. 

Since the bins are spaced by y, the number of possible bins 
the state can be in is bounded by 2 + A™ (2 + A ^x-i) ) an< ^ so 
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K = 4 + A ^-i) ma kes property [b] true. 

Proof that [a],[b] => [c]\ [a] guarantees that the bin 
corresponding to X no i sy (t + n) is guaranteed to contain Xt+ n - 
[b] guarantees there are only at most KX n < L adjacent bins 
that the state could be in. Since the modulo operation used 
to assign regular labels only assigns the same label to a bin 
L positions away or further, all of the KX n positions have 
distinct labels and hence the labeling of X no i sy (t + n) picks 
out the unique correct bin. □ 



Lemma 15711 allows the observer to just use regular A-lattice 
quantizer to translate the state positions into bins since the 
control actions are side-information that is known perfectly at 
the intended recipient (the controller). The overhead implied 
by the constant K can be amortized by looking at time in 
blocks of n and so does not asymptotically cost any rate. This 
can be used to extend Corollarv l5.1l to cases without any access 
to the control. Every n time-units, the observer can just apply 
the appropriate regular A-lattice quantizer and send the bin 
labels through an anytime code that operates without feedback. 
However, anytime codes without feedback have a natural tree 
structure since the impact of the distant past must never die 
out. In the stabilization context, this tree structure forces the 
observer/encoder to remember the bin sequence corresponding 
to all the past states. This seems wasteful since closed-loop 
stability implies that the plant state will keep returning to the 
bins in the neighborhood of the origin. This suggests that this 
memory at the observer is not necessary. 

Theorem 5.2: It is possible to control an unstable scalar 
process driven by a bounded disturbance over a DMC so that 
the r/-moment of \X t \ stays finite for all time if the channel 
without feedback has random coding error exponent E r (R) > 
77 log 2 A for some R > log 2 A and the observer is allowed 
boundedly noisy access to the state process. 

Furthermore, there exists an n > so this is possible by us- 
ing an observer consisting of a time-varying randomly-labeled 
A-lattice based quantizer that samples the state every n time 
steps and outputs a random label for the bin index. The random 
labels are chosen iid from A n according to the distribution 
that maximizes the random coding error exponent at R. The 
controller must have access to the common randomness used 
to choose the random bin labels. 

Proof: Fix a rate R > log 2 A for which E r (R) > rj log 2 A. 

Lemma l5~T1 applies to our quantizer. Pick n, A large enough 

so that 2 nR > KX n where the K comes from property [b] 

above. This gives: 
d. Conditioned on actual past controls applied, the set of 
possible paths that the states Xq, X n , X 2 „, . . . could have 
taken through the quantization bins is a subset of a 
trellis that has a maximum branching factor of 2 nR 
Furthermore, the total length covered by the d-stage 
descendants of any particular bin is bounded above by 
KX dn . 

Not all such paths through the trellis are necessarily possi- 
ble, but all possible paths do lie within the trellis. Figure fl4l 
shows what such a trellis looks like and Figure [TBI shows its 
tree like local property. Furthermore, the labels on each bin 
are iid through both time and across bins. 




R = log 2 3 



Fig. 14. A short segment of the randomly labeled regular trellis from the 
point of view of the controller that knows the actual control signals applied 



in the past. The example has R - 



, 3 and A ss 2.4 with A large. 




Fig. 15. Locally, the trellis looks like a tree with the nodes corresponding 
to the intervals where the state might have been and the levels of the tree 
correspond to the time. It is not a tree because paths can remerge, but all 
labels on disjoint paths are chosen so that they are independent of each other. 



Call two paths of length t through the trellis disjoint with 
depth d if their last common node was at depth t — d and the 
paths are disjoint after that. Consequently: 
e. If two paths are disjoint in the trellis at a depth of d, then 
the channel inputs corresponding to the past dn channel 
uses are independent of each other. 

The suboptimal controller just searches for the ML path 
through the trellis. The trellis itself is constructed based on 
the controller's memory of all past applied controls. Once an 
ML path has been identified, a control signal is applied based 
on the bin estimate at the end of the ML path. The control 
signal just attempts to drive the center of that bin to zero. 

Consider an error event at depth d. This represents the case 
that the maximum likelihood path last intersected with the true 
path dn time steps ago. By property [d] above, the control will 
be based on a state estimate that can be at most KX dn bins 
away from the true state. Thus: 
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f. If an error event at depth d occurs at time t, the state 
|X t+ „| can be no larger than K'\( d+1 ^ n for some con- 
stant K' = 2AA' that does not depend on d or t. 

Property [f] plays the role of 1241 in this proof. 

By property [d], there are no more than 2 dnR possible false 
paths that last intersected the true path d stages ago. By the 
memorylessness of the channel, the log-likelihood of each path 
is the sum of the likelihood of the "prefix" of the path leading 
up to d stages ago and the "suffix" of the path from that 
point onward. For a path that is disjoint from the true path 
at a depth of d to beat all paths that end up at the true final 
state, the false path must have a suffix log-likelihood that beats 
the suffix log-likelihood of at least the true path. Property [e] 
guarantees that the channel inputs corresponding to the false 
paths are pairwise independent of the true inputs for the past 
dn channel uses. 

All that is required to apply Gallager's random block-coding 
analysis of Chapter 5 in [1] is such a pairwise independence 30 
between the true and false codewords for a code of length dn. 

g. The probability that the ML path diverges from the true 
path at depth d is no more than 2~ dnEr ( R \ 

All that remains is to analyze the ^-moment by combining 
[g] and [f] and using the union bound to compute the expec- 
tation. 

t_ 

E[\X t+n \ r '} < J22- dnE " {R) {K'\ (d+1)n ) T > 

d=0 

oo 

< (K'X n ) n 2~ dnEr ( R y')X ndn 



d=0 

oo 



(K'\ n y i J2 2 ~ dn 



(E r (R)- v log 2 \) 



d=0 



— A < oo 

where the final geometric sum converges since E r {R) > 
n log 2 A. □ 



Although the condition in Theorem l5.2l is not tight, the result 
has several nice features. First, it allows easy verification of 
sufficiency for a good channel since E r (R) is easy to calculate. 
Structurally, it demonstrates that there is no need to use very 
complex observers. The intrinsic memory in the plant can 
play the role of the memory that would otherwise need to be 
implemented in a channel code. The complexity can be shifted 
to the controller, and even that complexity is not too bad. 
Sequential decoding can be used at the controller since it is 
known to have the same asymptotic performance with respect 

'"Notice that pairwise independence is also obtained if the random labels 
were assigned using an appropriate random time-varying infinite constraint- 
length convolutional code (with the symbol-merging tricks of Figure 6.2.1 
of [1] to match the desired channel input-distribution) applied to the binary 
expansion of the integer j corresponding to the selected bin at each stage. 
Since the closed-loop system is stable, the state is presumably small and the 
bin is close to 0. As such, all of the higher-order bits in the binary expansion 
of the bin label are zeros and do not cause any computational burden when 
operating the convolutional code. This is related to the feedback convolutional 
codes with variable constraint-lengths discussed further in [6]. Because of this, 
the computational burden of running this observer is non-increasing with time. 



to delay as the ML decoder[45], [46]. Because the closed- 
loop system is stable and thereby renews itself constantly, 
the computational burden of running sequential decoding (and 
hence the controller) does not grow unboundedly with time 
[47]. 

Since E r (R,Q) > for all R < C and the capacity- 
achieving distribution Q, Theorem 15. 21 can also be recast in a 
weaker Shannon capacity-centric form: 

Corollary 5.2: If the observer is allowed boundedly noisy 
access to the plant state, and the noisy channel is a DMC with 
Shannon capacity C > log 2 A, then there exists some n > 
and an observer/controller pair that stabilizes the system in 
closed loop so that the ^-moment of \X t \ stays finite for all 
time. 

Furthermore, there exists an n > so this is possible 
by using an observer consisting of a time-varying randomly- 
labeled A-lattice based quantizer that samples the state every 
n time steps and outputs a random label for the bin index. 
This random labels are chosen iid from the A n according to 
the capacity-achieving input distribution. The controller must 
have access to the common randomness used to choose the 
random bin labels. 

Applying Theorem l4.3l to Corollary 15 .21 immediately results 
in the following new corollary: 

Corollary 5.3: If the observer is allowed perfect access to 
the plant state, and the noisy channel is a DMC with Shannon 
capacity C > log 2 A, then there exists an observer/controller 
pair that stabilizes the system {0 in closed loop so that: 

lim Xt = almost surely 

t— >oo 

as long as the initial condition \X \ < % and the disturbances 
W t = 0. 

Furthermore, there exists an n > so this is possible 
by using an observer consisting of a time-varying randomly- 
labeled At -lattice based quantizer that samples the state every 
n time steps and outputs a random label for the bin index. The 
At shrink geometrically with time, and the random labels are 
chosen iid from the A n according to the capacity-achieving 
input distribution. The controller must have access to the 
common randomness used to choose the random bin labels. 



B. Communicating the channel outputs back to the observer 

In this section, the goal is to recover the tight condition 
on the channel from Theorem 14.21 To do this, we construct 
a controller that explicitly communicates the noisy channel 
outputs to the observer using whatever "channels" are available 
to it. First we consider using a noiseless control signal to 
embed the feedback information. This motivates the technique 
used to communicate the feedback information by making the 
plant itself dance in a stable way that tells the observer the 
channel output. 
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1 ) Using the controls to communicate the channel outputs: 
The idea is to "cheat" 31 and communicate the channel outputs 
through the controls. The control signal is thus serving dual 
purposes — stabilization of the system and the communication 
of channel outputs. Suppose the observer had noiseless access 
to the control signals. The controller can choose to quantize 
its real-valued controls to some suitable level and then use the 
infinite bits remaining in the fractional part to communicate 
the channel outputs to the observer. The observer can then 
extract these bits noiselessly and give them to the anytime 
encoder as noiseless channel feedback. 

Of course, this additional fractional part will introduce an 
added disturbance to the plant. One approach is to just consider 
the quantization and channel output communication terms 
together as a bounded noise on the control signals considered 
in Section HV-GI This immediately yields: 

Corollary 5.4: It is possible to control an unstable scalar 
process driven by a bounded disturbance over a noisy channel 
so that the //-moment of \Xt\ stays finite for all time if the 
channel has feedback anytime capacity Cany (a) > log 2 A for 
some a > 77 log 2 A and the observer is allowed to observe the 
control signals perfectly. 

However, the additional disturbance introduced by the quan- 
tization of the original control signal and the introduction 
of the new fractional part representing the channel output is 
known perfectly at the controller end. Meanwhile, the output 
of the virtual-process based observer does not depend on the 
actual applied controls anyway since it subtracts them off. So 
rather than compensating for this quantization+signaling by 
expanding the uncertainty £1 and thus changing the A at the 
observer, the controller can just clean up after itself. This idea 
allows us to eliminate all access to the control signals at the 
observer and generalizes to many cases of countably large 
channel output output alphabets. 

2) Removing noiseless access to the controls at the ob- 
server: There are two tricks involved. The first is the idea of 
making the plant "dance" appropriately and using the moves 
in the dance to communicate the channel outputs. The second 
idea is to introduce an artificial delay of 1 time step in the 
determination of the "non-dance" component of the control 
signals. This makes the non-dance component completely 
predictable by the observer and allows the observer to clearly 
see the dance move corrupted only by the bounded process 
disturbance. Putting it together gives: 

Theorem 5.3: Given a noisy channel with a countable al- 
phabet, identify the channel output alphabet with the integers 
and suppose that there exist K > 0, (3 > ?/ so that the channel 

3 'We call this "cheating" since it violates the spirit of the requirement 
against access to the channel outputs. However, it is important to establish 
this result because it points out the need for a serious future study where 
the communication constraints back from the controller to the observer are 
modeled more carefully. A more realistic model for the problem should have 
a sensor observing the plant connected via a communication channel to the 
controller. The controller is then connected to an actuator through another 
communication channel. The actuator finally acts upon the plant itself. With no 
complexity constraints, this reduces to the case studied here with the controller 
merely playing the role of a relay bridging together two communication 
channels. The relay anytime reliability will become the relevant quantity to 
study. 
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Fig. 16. Overlaying messages onto the control signal and recovering the 
messages at the observer. The control signal is generated based on unit-delayed 
channel outputs with the current output being communicated back. 



outputs B t satisfy: T(\B t \ > i) < Ki~P for all t regardless 
of the channel inputs. 

Then, it is possible to control an unstable scalar plant driven 
by a bounded disturbance over that channel so that the 77- 
moment of \X t \ stays finite for all time if the channel has 
feedback anytime capacity Cany (a) > log 2 A for some a > 
i] log 2 A even if the observer is only allowed to observe the 
state X t corrupted by bounded noise. 

Proof: The overall strategy is illustrated in Figure ^] The 
channel output extraction at the observer is illustrated in 
Figure ^] in the context of a channels with output alphabet 
size \B\ = 5. 

Let Ut(b^~ ) be the control that would be applied from 
Theorem 14.51 as transformed by the action of Theorem 14.61 if 
necessary. It only depends on the strictly past channel outputs. 

Let b t be the current channel output. The control applied is: 



U' t {bt)=U t {b\- 1 )+F{b t ) 



A^'-i^ 1 )-^-!^- 2 )) 
(30) 

where the function F(b t ) is the "dance move" corresponding 
to the channel output. 

First consider the case that perfect state observations X t are 
available at observer. At time t the observer can see the control 
signal only as it is corrupted by the process disturbance since 
Ut-i + Wt-i — X t — XX t -i- By observing X perfectly, the 
observer has in effect gained boundedly noisy access to the U 
with r„ = fi. Now suppose that the observations of X were 
boundedly noisy with some T. In that case: 

\Ut-l ~ (X n0 i S y(t) — XX n0 i S y{t — 1))| 

= |U t _ 1 -(JC t -AX t _i) + (AiV t _ 1 -JVi)| 
= I - W t -i + (AiV t _i - N t )\ 

< ft + (A + i)r 
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Fig. 17. How to communicate the channel outputs through the plant with 
state observations only. The controller restricts its main control signal to be 
calculated with an extra delay of 1 time unit and then adjusts it by — X(Ut—i — 
t° eliminate the effect of the past communication. The final control 
signal applied is shifted slightly to encode which bt was received. The decoder 
uses the past 6g — to align its decoding regions and then reads off bt by using 
Xt+i — XXt. 



In this case, the effective observation noise on the controls is 
bounded by T u = Cl + (A + l)r. 



Just by looking at the state and its history, the observer has 
access to U° with the property that \U' t — U°\ < T u . To ensure 
decodability of bt, set F(b t ) = 3T u bt so the channel outputs 
are modulated to be integer multiples of 3r u . 



At time t = 0, the observer is unchanged since there is 
nothing for it to learn and no applied controls. At time t = 1, 
because of the induced delay of 1 extra time step, there are 
no delayed controls ready to apply either and so the applied 
control only consists of 3r„6o- This is observed up to precision 
T u and so the observer can uniquely recover bo and feed it to 
its anytime encoder. 



Assume now that the observer was successful in learning 
b ~ in the past. Then it can compute the <7t(6 _1 ) term as 
well as the C/ 4 '_ 1 (6q~ 2 ) — Ut-iib^ 1 ) using this knowledge and 
can subtract both of them from its observed U°. This leaves 
only the 3T u b t term which can be uniquely decoded given that 
the observation noise is no more than T u in either direction. 
By induction, the observer can effectively recover the past 
channel outputs from its noiseless observations of the control 
signal and can thereby operate the feedback anytime-encoder 
successfully. 



The communication of each channel output b t only im- 
pacts the very next state by shifting it by 3T u b t . At 
the next time, it is canceled out by the correction term 
-A ([/ t '_ 1 (6o" 1 ) - U t -i{bl~ 2 )). The non-dancing controlled 
state X' t+1 = (X t +i — 3T u B t ) has at least a power-law tail 
V{X' t+1 > x) < K'x-^+z) for some K' and e > 0. Then 



V \X'\ > -mn dm 



-V I \B\ > -^rnv ) dm 



< 



K I — 

o V2 



K [ 777,1 

6T U 



-P 



dm 



Since j3 > rj, this converges and so the 77-moment of X also 
exists. □ 

The channel output condition in 15.31 is clearly satisfied 
whenever the channel has a finite output alphabet. Beyond that 
case, it is satisfied in generic situations when the input alphabet 
is finite and the transition probabilities p(b\a) individually have 
an light enough tail for each one of the finite a values. 32 When 
the channel input alphabet is itself countable, the condition is 
harder to check. 

If information must flow noiselessly from the controller to 
the observer, the key question is to quantify the instantaneous 
zero-error capacity of the effective channel through the plant. 
Here, the bounded support of W and the unconstrained nature 
of U are critical since they allow the instantaneous zero-error 
capacity of that effective channel to be infinite. Of course, 
there remains the problem of the dual-nature of the control 
signal — it is simultaneously being asked to stabilize the plant 
as well as to feedback information about the channel outputs. 
The theorem shows that the ability of the controller to move 
the plant provides enough feedback to the encoder in the case 
of finite channel output alphabets or channels with uniformly 
exponentially bounded output statistics. 

At an abstract level, the controller is faced with the problem 
of causal "writing on dirty paper" [48] where the information it 
wishes to convey in one time step is the channel output and the 
dirty paper consists of the control signals it must apply to keep 
the system stable and to counteract the effect of the writing it 
did in previous time steps. Here, the problem is finessed by 
introducing the artificial delay at the controller to ensure that 
the "dirt" is side-information known both to the transmitter 
and the receiver. For finite output alphabets, it is also possible 
to take a direct "precoding" approach to do this by encoding 
the channel outputs by placing the control to the appropriate 
value modulo 3r„(|/S| + 1). This is a bounded perturbation of 
the control inputs and Theorem 14.61 tells us that this does not 
break stability if the A is adjusted appropriately. 

Finally, it might seem that this particular "dance" by the 
plant will be a disaster for performance metrics beyond 
stabilization. This is probably true, but we conjecture that 
such implicit feedback through the plant will be usable without 
much loss of performance. If it has memory, the observer can 
notice when and how the channel has misbehaved since the 



For example, an AWGN channel with a hard-input constraint and quan- 
tized outputs. 
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plant's state will start growing rather than staying near 0. The 
A-lattice based quantizer used in the observer for Theorem l5.2l 
could not exploit this because it was memoryless and used 
uniformly sized bins regardless of whether the state was large 
or small. 

VI. Continuous time systems 

A. Overview 

So far, we have considered a discrete-time model for 
the dynamic system that must be stabilized over the commu- 
nication link. This has simplified the discussion by having a 
common clock that drives both the system and the uses of the 
noisy channel. In general, there will be a t c that represents the 
time between channel uses. This allows translating everything 
into absolute time units. 

X(t)=XX(t) + U(t) + W(t), i>0 (31) 

where the bounded disturbance |W(f)| < § and there is a 
known initial condition X(0) = 0. If the open-loop system is 
unstable, then A > 0. 

Sampling can be used to extend both the necessity and 
sufficiency results to the continuous time case. The basic result 
is that stability requires an anytime capacity greater than A nats 
per second. 

B. Necessity 

For necessity, we are free to choose the disturbance signal 
W(t) and consequently can restrict ourselves to piecewise 
constant signals 33 that stay constant for time r. By sampling 
at the rate —, the sampled state evolves as X(r(i + 1)) = 

At _ -I /-(i+l)T 

e XT X{ri) + { Wi) + / U{s)e^ T ^-^ds 

A JiT 

(32) 

Notice that d32i is just a discrete time system with A' = e Xr 
taking the role of A in Q, and the disturbance is bounded by 
Q' = H£_2zil ah that remains is to reinterpret the earlier 
theorem. 

By setting r = t c to match up the sampling times to the 
channel use times, it is clear that the appropriate anytime 
capacity must exceed log 2 A' = t c A log 2 e bits per channel 
use. By converting units to nats per second 34 , we get the 
intuitively appealing result that the anytime capacity must be 
greater that A nats/sec. 35 Similarly, to hold the 77-th moment 
constant, the probability of error must drop with delay faster 
than K 2 _ ( r ' Alog 3 e ) dT <= where d is in units of channel uses and 
thus dr c has units of seconds. Thus, we get the following pair 
of theorems: 

Theorem 6.1: For a given noisy channel and 77 > 0, if there 
exists an observer O and controller C for the unstable scalar 
continuous time system that achieves S[|X(t)| T? ] < K for 
all t and bounded driving noise sig nals \W{t)\ < f, then 

3 3 zero order hold 

34 Assuming that X is in per second units. 

35 This truly justifies nats as the "natural" unit of information! 



the channel's feedback anytime capacity Cany(?yAlog 2 e) > A 
nats per second. 

Theorem 6.2: For a given noisy channel and decreasing 
function f(m), if there exists an observer O and controller 
C for the unstable continuous-time scalar system that achieves 
"P(|AT(/j)| > m) < f(m) for all t and all bounded driving noise 
signals |W(i)| < ^, then Cg-any(s) > A nats per second for 
the noisy channel considered with the encoder having access to 
noiseless feedback and g(d) having the form g(d) = f(Ke xd ) 
for some constant K. 

C. Sufficiency 

For sufficiency, the disturbance is arbitrary but we are free 
to sample the signal as desired at the observer and apply 
piecewise constant control signals. Sampling every r units 
of time gives rise to i32\ only with the roles of W and U 
reversed. It is clear that W t = f-* +1)T W{s)e x ^ i+1 ^ds is 
still bounded by substituting in the upper and lower bounds 
and then noticing that \Wi\ < S ^ e 2A ~ 1 ' 1 . 

Thus, the same argument above holds and the sufficiency 
Theorems 14. II 14721 and 15.31 as well as Corollaries 15 .41 and 15 . 1 1 
translate cleanly into continuous time. In each, the relevant 
anytime capacity must be greater than A nats per second. Since 
the necessary and sufficient conditions are right next to each 
other, it is clear that the choice of sampling time does not 
impact the sense of stability that can be achieved. Of course, 
this need not be optimal in terms of performance. 

Finally, if the channel we face is an input power-constrained 
oo-bandwidth AWGN channel, more can be said. Sec- 
tion IIII-C.4I makes it clear that nothing special is required in 
this case: using linear controllers and observers is good enough 
if the average power constraint is high enough. But what if 
the channel had a hard amplitude constraint that allowed the 
encoder no more than P power per unit time? In this case, it 
is possible to generalize Theorem 15.21 in an interesting way. 

In [49] we give an explicit construction of a feedback-free 
anytime code for the infinite bandwidth AWGN channel that 
uses a sequential form of orthogonal signaling. In the oo- 
bandwidth AWGN channel, pairwise orthogonality between 
codewords plays the role that pairwise independence does 
for DMCs. Applying that principle through the proof of 
Theorem 15.21 the observer/encoder can simply be a time- 
invariant regular partition of the state space with the bins being 
labeled with orthogonal pulses, each with an energy equal 
to the hard limit for the channel. 36 The encoder just pieces 
together pulses with shapes corresponding to where the state 
is at the sampling times. The controller then searches for the 
most likely path based on the channel output signal as well as 
the past control values, and then applies a control based on the 
current estimate. This approach allows the use of occasional 
bandwidth expansion to deal with unlucky streaks of channel 

36 In particular, the following sequence of pulses work with an appropriate 
scaling. For < t < r, set gi, T (t) = — sgn (sinf 4 - 2 ^)) and g-i tT (t) = 
■i-sgn ^sin( 2,r ( 2 ^~ 1 ) 1 )^ ant j zero everywhere else. Here r is the time 
between taking samples of the state. The <yi jT functions are orthogonal, and 
the i-th function is the channel input corresponding to the i-th lattice bin for 
the plant state observation. 
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noise while keeping the channel input power constant. The 
details of this approach are given in [50]. 

VII. A Hierarchy Of Communication Problems 

In this final section, we interpret some of the results in a 
different way inspired by the approach used in computational 
complexity theory. There, the scarce resource is the time and 
space available for computation and the asymptotic question 
is whether or not a certain family of problems (indexed by n) 
can be solved using the limited amount of resource available. 
While explicit algorithms for solving problems do play a role, 
"reductions" from one problem to another also feature promi- 
nently in relating the resource requirements among related 
problems [51]. 

In communication, the scarce resource can be thought of 
as being the available channel. 37 Problems should be ordered 
by what channels are good enough for them. We begin with 
some simple definitions and then see how they apply to 
classical results from information theory. Finally, we interpret 
our current results in this framework. 

Definition 7.1: A communication problem is a partially 
specified random system together with an information pat- 
tern and a performance objective. This is specified by 
a triple: {S,I, V). The partially specified random system 
S = (So, Si, . . .) in which Si are real valued functions on 
[0, x M l . The output of the Si function is denoted Xi. 
The information pattern I identifies what variables each of 
the z-th encoders and decoders has access to. The performance 
objective V is a statement that must evaluate to either true or 
false once the entire random system is specified. 

As depicted in Figure the communication problem is 
thus an open system that awaits interconnection with encoder, 
channel, and decoder maps. The channel is a measurable map 
f c from [0, 1] x M into M. The encoder and decoder are both 
represented by a possibly time-varying sequence of real valued 
functions compatible with the information pattern T. 

Once all the maps are specified, the random system becomes 
completely specified by tying them to an underlying proba- 
bility space consisting of three iid sequences (Wi,Vi, B4) of 
continuous uniform random variables on [0,1]. The W[ are 
connected to the first input of Si while Vi is connected to the 
first input of the memoryless channel. As is usual, the output of 
the encoder is connected to the remaining input of the channel, 
and all the past outputs of the channel are connected to the 
decoding functions as per the information patterns. Finally, 
assume that common randomness Ri is made available to both 
the encoder and decoder so that they may do random coding if 
desired. Once everything is connected, it is possible to evaluate 
the truth or falsehood of V. 

Definition 7.2: A channel is said to solve the problem if 
there exist suitable encoder and decoder maps compatible with 
the given information pattern so that the combined random 
system satisfies the performance objective V. 

Communication problem A is harder than problem B if any 
channel f c that solves A also solves B. 

37 This might in turn be related to other more primitive scarce resources 
like power or bandwidth available for communication. 
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Fig. 18. Abstractly, a communication problem consists of a partially specified 
random system consisting of a known and possibly interactive source together 
with an information pattern. The noisy channel and encoder/decoders need to 
be specified before all the random variables become properly defined. 



Each particular communication problem therefore divides 
channels into two classes: those that solve it and those that do 
not. Suitable families of communication problems, ordered by 
hardness, can then be used to sort channels as well. Channels 
that solve harder problems are better than ones that do not. The 
equivalence of certain families of communication problems 
means that they induce the same orderings on communication 
channels. This will become clearer by the examples of the next 
few sections. 

A. Classical Examples 

1) The Shannon communication problem: Shannon iden- 
tified the problem of communicating bits reliably as one of 
the core problems of communication. In our framework, this 
problem is formalized as follows: 

• Xi = 1 if Wi > \ and Xi = otherwise. The functions 
Si ignore all other inputs. 

• The information pattern T specifies that T>i has access 
to Z\. The encoder information pattern is complete in 
the case of communication with feedback: £ j has access 
to X\ as well as Z 1 ^ 1 . Without feedback, £{ has access 
only to X\. 

• The performance objective V(e, d) is satisfied if V{X% 7^ 
Ui+d) < e for every i > 0. 

The Shannon communication problem naturally comes in a 
pair of families A[ d with feedback and A n J d without feedback. 
These families are indexed by the tolerable probability of bit 
error e and end-to-end delay d. 

To obtain other rates R > 0, adjust the source functions as 
follows: 



Xi 



2 LfiiJ-|H(i-l)J 
3+1 



if 



TF, 



L 2 L«ij-L«(i-i)j , 2L BiJ-LH(i-i)j ) fOT inte 8 er 3 ^ °- 

The possibly time-varying functions Si ignore all other 
inputs. 

These naturally result in families A R d and A R * e , for the 
feedback and feedback-free cases respectively. It is immedi- 
ately clear that A 1 ^ d is harder than A R d and furthermore 
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problems with smaller e or d are harder than those with larger 
ones. It is also true that A R d is harder than A RI d whenever 
R < R' in that it is more challenging to communicate reliably 
at a high rate rather than a low one. 

The set of channels with classical Shannon feedback capac- 
ity of at least R is therefore: 

c R=n n U{^ soives < 33 > 

e>0R'<R d>0 

and similarly for C R . The classical result that feedback does 
not increase capacity tells us that C R = C R . Because of this, 
we just call them both Cr. 

2) The zero-error communication problem: A second prob- 
lem is the one of zero error communication. It is defined 
exactly the same as the Shannon communication problem 
above, except that e = 0. 

The channels that have feedback zero-error capacity of at 
least R with feedback are therefore: 



C f 



f| U{M/ cSolves <,o^} (34) 



R'<Rd>0 



and similarly for Cq r . In this case, the result with and without 
feedback can be different and furthermore, C^ R C Cq r C Cr 
[25]. In this sense, zero-error communication is fundamentally 
a harder problem than e-error communication. 

3) Estimation problems with distortion constraints: Con- 
sider iid real valued sources with cumulative distribution 
functions F x (t) = V{X < t). 

• Xi = F^ 1 (Wi) ignoring all the other inputs. This gives 
the desired source statistics. 

• The information patterns remain as in the Shannon prob- 
lem. 

• The performance objective V(p,D,d) is satisfied if 
lim™ i^ELi P(*i, Ui+d)\ < D. 

Call these estimation problems Af Fx p D d \ and ^4™/ x P d d) 
(for the cases with/without feedback) and once again associate 
them with the set of channels that solve them in the limit of 
large delays: 

0U,„/«. f| \J{MfcSo\vzsA{ FxpD , d) } (35) 

D'>D d>0 



and similarly for C™{„ „ For cases where the distortion 



■e,(F x ,P,D)- 

p is bounded, the existing separation result can be interpreted 
as follows: 



C 



R(D) 



C 



nf 

e,(F x ,p,D) 



c 



f 

e,(F x ,p,D) 



(36) 



where R(D) is the information-theoretic rate-distortion curve. 

The interpretation of this separation theorem is that in 
the limit of large delays, estimation problems with a fidelity 
constraint are no harder or easier than Shannon communication 
problems dealing with bits. Both families of problems induce 
essentially the same partial order on channels. 



B. Anytime communication problems 

The anytime communication problems are natural general- 
izations of the binary data communication problems above. 
Everything remains as in the Shannon communication prob- 
lem, only the performance measure changes. Let Ut = 
O.Xo(t), Xi(t), X2(t), . . . when written out in binary notation. 
This can always be done and the parsing of the string is unique 
no matter what the rate is. 

• V(K,a) is satisfied if V{X, ^ X t {i + d)) < K2- ad for 
every i > 0, d > 0. 

Call these problems A^ R a K -. when feedback is allowed 

and A" R a K s ) when it is not permitted. Once again, it is 
clear that the non-feedback problems are harder than the 
corresponding feedback problems. Furthermore, A^ a x) is 



harder than A 



(R,a\K) 



if a' < a in addition to the usual fact 



of ; a k) being harder than A^i a K) if R' < R- Similarly, 
smaller K values are harder than larger ones. 

The channels with a-anytime feedback capacity of at least 
R are then given by: 



a,(R,a) 



n n u We s ° ives A (w, a ,K } } (37) 



R'<Ra'<a K>0 



with a similar definition for C™*, R a y It is immediately clear 
that 

C 0,R ^ C i(R,a) ^ C R 

The case of a = is defined as the limit: 



C 



f 

a,{R,0) 



IK 

Q>0 
/ 



(R.a) 



(38) 



It turns out in this case that C J a , R ^ = C* r R % = Cr since 
infinite random tree codes can be used to communicate reliably 
at all rates below the Shannon capacity [23]. 
However, for other a > 0, 



^0,fl C ^a!(R,a) C ^i,(R,a) C ^ R 



and 



OR n R 



0,R 



-'a,(R,< 



C Cr 



with all of these being strict inclusion relations. Cq r and 



C 



nf 

a,(R,a) 



are not subsets of each other in general. 



In this sense, there is a non-trivial hierarchy of problems 
with Shannon communication as the easiest example and zero- 
error communication as the hardest. 

C. Control and the relation to anytime communication 

The stabilization problems considered in this paper are 
different in that they are interactive. The formulation should 
be apparent by comparing Figure with Figure [2] 

• Xi represents the state of the scalar control problem with 
unstable system dynamics given by A > 1. The Wt is the 
bounded disturbance and Ui represents the control signal 
used to generate Xi+\. 

• The information pattern with and without feedback is as 
before. 

• The performance objective Vfaj k) is satisfied if 
SOXjp'] < K for all i > 0. 
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for cases with feedback and 



Call this problem AL K ^ 

for cases without feedback available at the encoder. 
The problem without feedback is harder than the problem with 
feedback. It is also clear that AL „, is harder than AL, „ n 

(X,ri,K) (\',r],K) 

whenever A > A' and similarly for A n * , The same holds if rj 
is made larger or K is made smaller. 



C U,V) = fl D U {/d/c SOlVeS A (X',rf.K)} (39) 
A'<A ))'<»; K>Q 

with a similar definition for C"| A , . The necessity result of 



Theorem 13.31 establishes that 



C 



nf 



c c 



I 



J c,(X-.v) — ^c,{\,tj) — ^a,(log 2 A, 7; log 2 A) 

while Theorem 14.21 establishes the other direction for the case 
of feedback: 



r>nf r r s 

U c,(\, v ) ^ U c,(A 1?) ) 



o,(log 2 A,r;log 2 A) 



(40) 



Meanwhile without feedback and restricting to the set of 
finite output alphabet channels (ie. where the range of f c has 
finite cardinality.) denoted Cg n , Theorem 15.31 implies : 

*X(Iog 2 A,r)log 2 A) ^ ^fin — ^c,(A,jj) ^ ^fin 

Combining with (I40i gives the following result for finite output 
alphabet channels: 



< -"f,(A,i 7 ) n ^fin 



C 



/ 

a,(log 2 A,?7log 2 A) 



nCfm (41) 



Finally, notice how the mapping from (A, 77) to (R, a) is 
one-to-one and onto. By setting A = 2 R and r\ = ^ it is 
possible to translate in the opposite direction and this does 
provide some additional insight. For example, in the anytime 
communication problem, it is clear that increasing R from 2 
to 3 while keeping a constant at 6 results in a harder problem. 
When translated to stabilization, without the results established 
here, it is far from obvious that the equivalent move from 
A = 4 to A = 8 with a simultaneous drop in the required r\ 
from 3 to 2 is also a move in a fundamentally harder direction. 

D. Discussion 

Traditionally, this hierarchy of communication problems 
had not been explored since there were apparently only two 
interesting levels: problems equivalent to classical Shannon 
communication and those equivalent to zero-error commu- 
nication. Anytime communication problems are intermediate 
between the two. Though feedback anytime communication 
problems are interesting on their own, the equivalence with 
feedback stabilization makes them even more fundamental. 

It is interesting to consider where Schulman's interactive 
computation problems fit in this sort of hierarchy. Because 
a constant factor slowdown is permitted by the asymptotics, 
such problems of interactive computation do not distinguish 
between channels of different Shannon capacity. In the lan- 
guage of this section, this means that Shannon communication 
problems are harder than those of interactive computation 
considered in [3]. 



Furthermore, the noisy channel definition given here can be 
extended to include channels with memory. Simply make the 
current channel output depend on all the current and past Vt 
and Y t . In that case, J40l > will continue to hold. Since the finite- 
output alphabet constructions never needed memorylessness, 
will a l so hold. 

The constructive nature of the proofs for the underlying 
theorems makes them akin to the "reductions" used in theoret- 
ical computer science to show that two problems belong to the 
same complexity class. They are direct translations at the level 
of problems and solutions. In contrast, the classical separation 
results go through the mutual information characterization of 
R(D) and C. It would be interesting to study a suitable analog 
of d36l for channels with memory. Feedback can now increase 
the capacity so the with-feedback and feedback-free problems 
are no longer equivalent. However, it would be nice to see a 
direct reduction of Shannon's communication problem to an 
estimation problem that encompasses such cases as well. The 
asymptotic equivalence situation is likely even richer in the 
multiuser setting where traditional separation theorems do not 
hold. 
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